Molmo is trained from scratch on vision-text data and excels at reading dense text such as charts or forms, with the ability to point to answers directly within images.
Reach for Molmo on text and image documents such as medical reports or annotated diagrams, where it shines at high-resolution multimodal inputs, visual QA, and GUI parsing.
