SEA-VL Phase 1 | SEACrowd - AI for Southeast Asia, by Southeast Asians

Paper: ACL 2025, arXiv
HuggingFace Collection: Link

We’re excited to present a major milestone from the SEACrowd team: the launch of SEA-VL, the largest open-source vision-language (VL) dataset specifically designed to represent the cultural diversity of Southeast Asia 🇧🇳🇰🇭🇹🇱🇮🇩🇱🇦🇲🇾🇲🇲🇵🇭🇸🇬🇹🇭🇻🇳.

Why SEA-VL?

Most vision-language datasets today reflect Western-centric imagery and language, leaving Southeast Asian cultures underrepresented and misinterpreted. SEA-VL is our open-source initiative to change that. It is designed to better represent the languages, traditions, and everyday realities of Southeast Asian communities.

Highlights of the dataset include:

1.3 million culturally relevant image-text pairs
Covers all 11 Southeast Asian countries
50× larger than any previous SEA-focused VL dataset
Hosted on Hugging Face: Explore SEA-VL

How We Built SEA-VL

We combined several approaches to balance scale with cultural fidelity:

Crowdsourcing — High cultural accuracy, but slow and resource-intensive
Image Crawling — ~85% cultural relevance and highly scalable
Image Generation — Still fails to reflect SEA cultures authentically and poses licensing challenges

For in-depth information on each approach, check out our paper.

What’s Next?

We extend our deepest thanks to the contributors across Southeast Asia who made this possible.

This is only the beginning: Phase 2 is on the horizon, and we invite researchers, practitioners, and community members to collaborate with us. Stay tuned on our Discord!

Together, let’s build AI that reflects the full spectrum of culture across Southeast Asia.