{"slug":"cogvlm2","name":"CogVLM2","kind":"tool","kindLabel":"Tool","grade":null,"score":null,"tagline":"General VQA, document Q&A, and GUI understanding with high-resolution inputs","description":"A vision-language model built on Llama3-8B that supports high-resolution inputs and excels in multi-turn dialogues over visually rich documents.","tags":["Document Processing","visual-qa","document-qa","gui-understanding"],"url":"https://way.space/tool/cogvlm2","stars":"—","repo":"https://github.com/THUDM/CogVLM2"}