AI-Powered Dermatological Assistant: Bridging Healthcare Gaps Through Multimodal Intelligence

Millions lack access to specialized dermatological care due to geographic and technological disparities. We present a novel multimodal framework that combines image-based diagnosis with a visual-question answering pipeline, powered by DINOv2 and a compressed LLaVA model. Our system supports accurate skin disease diagnosis and explanation, optimized for low-resource settings.

This project introduces a clinical-grade Visual Language Model (VLM) capable of dermatological diagnosis using natural language prompts and images. Our AI assistant is trained via four stages: auxiliary classification, medical reasoning, interaction optimization, and resource-efficient deployment through structured pruning. The final model achieves 82.05% diagnostic accuracy and a 9/10 patient interaction score, even when operating within <4.5GB of memory.

Key contributions:

Integration of DINOv2 and LLaVA for robust image-text understanding.
Domain-specific fine-tuning and question-answering for medical settings.
Progressive enhancement through reasoning, DPO, and pruning.
Local and global impact potential—especially in under-resourced areas.

📄 View Poster Below: Methodlogy figure

Tanvir Ahmed Khan