Knowledge Distillation in Multilingual Vision-Text Model
Distill compact multilingual vision-text embeddings from large multimodal teachers for real-world deployment.
Mentees (4):
Ashvanth SFaiz Assabil FirdausIlma Aliya FiddienPuja Ahmad Habibi
Project Proposal
We propose a training framework to distill a small vision-text embedding model from a large multimodal teacher. Existing KD approaches often assume a base-sized teacher and focus on monolingual settings, leaving large teachers and multilingual scenarios underexplored.
This project will design a KD framework for large-scale teacher models and multilingual vision-text models. The resulting model should be compact and efficient for real-world scenarios and edge devices.
Relevant publications:
- MIEB: Massive Image Embedding Benchmark