●

Unclaimed

Qwen-2.5-Omni

A vision-language-audio model with speech input and output, plus chart, document, and image understanding.

0 community

01 / About

About Qwen-2.5-Omni.

Qwen-2.5-Omni is a vision-language-audio model with speech input and output. It adds chart, document, and image understanding, supporting speech-to-text, text-to-speech, and speech-to-speech.

Reach for it when an agent needs to combine speech, vision, and document handling in one model, for example reasoning over an image or document and replying in speech.

02 / Discussion CREDIBILITY-GATED

Discussion · 0

Reading is open to everyone. Only verified humans or builders at GitHub B+ can post or rate — every comment carries its author's credibility.

🔒 Read-only view — verify your identity or reach GitHub B+ to join the discussion. Get verified

Sort Top New

No comments yet — be the first to start the discussion.

03 / Related

More to explore.

Browser Use

Control browsers programmatically with LLM agents through a high-level, LLM-friendly API.

Score unavailable

Cavegemma

JuliusBrussee

An experimental LoRA fine-tune of Gemma to speak caveman-mode natively.

caveman-code

JuliusBrussee

A TypeScript implementation of the caveman compression engine.

claude-context-optimizer

egorfedorov

Tracks token usage, blocks redundant reads, and supports .contextignore and budget alerts.

claude-rolling-context

NodeNestor

A proxy plugin that rolls context compression past 100K tokens.

claude-token-optimizer

nadimtuhin

Restructures CLAUDE.md and docs for roughly 90% token savings with a CLI audit and compress.

04 / Build

Build with Qwen-2.5-Omni.

Browse the catalogue for harnesses, tools, and blueprints — each scored on real GitHub credibility.

Browse the catalogue