{"slug":"phi-4-multimodal","name":"Phi-4 Multimodal","kind":"tool","kindLabel":"Tool","grade":null,"score":null,"tagline":"Multimodal document understanding with integrated speech and vision in a compact model","description":"A compact multimodal model that processes text, image, and audio inputs with a 128K token context, supporting OCR and chart and table understanding.","tags":["Document Processing","multimodal","speech","ocr"],"url":"https://way.space/tool/phi-4-multimodal","stars":"—","website":"https://huggingface.co/microsoft/Phi-4-multimodal-instruct"}