Donate to help us build inclusive AI for Southeast Asia
SEA-VL Phase 2

SEA-VL Phase 2

After the success of SEA-VL Phase 1, we are proud to launch SEA-VL Phase 2!

We believe it’s high time to create a model that truly understands Southeast Asian culture and language. We want the model to reflect the visual and linguistic richness of the SEA region through diverse contributions: high-quality data curation, annotation, prompting, model training, and evaluation.

Contribution Guide

Earn points for every contribution and unlock rewards:

  • 200+ points: Certificate of participation + merchandise (t-shirt and keychain)
  • 300+ points: Co-authorship in the final project paper (not the experimental paper)

How to contribute: SEA-VL Phase 2 has five main tasks. Choose any task that aligns with your interests or expertise.

Task 1: Submit a culturally-relevant image with description for SEA

To build the vision-language model, we need a compilation of culturally relevant images representing all 11 Southeast Asian countries.

Any image that reflects an aspect of SEA culture is considered culturally-relevant. This could include

  • food and cuisines (e.g., nasi goreng, pho, green curry)
  • locations (e.g., Manila’s Escolta Street
  • events (e.g., Lunar New Year festivities)
  • everyday cultural practices (e.g., eating with your hands).

As long as it connects to SEA culture, it’s a great fit. You may check this resource to see what images are fit for this task.

Go to this form to submit your images.

For bulk image upload, follow this guide!

Task 2: Review Image-Description Pairs

To ensure quality images for the dataset used in training the model, we need local annotators to rate the submitted images from Task 1.

Contributors for this task must first pass this short screening test. Check our annotation guideline here to learn more and to apply for the screening test. You will receive a link to the annotation platform if you pass the screening test.

Task 3: Translate Benchmark Datasets

To ensure the proper evaluation and testing of vision language capabilities, we need help translating prompts from existing vision language model benchmarks, such as the Aya Vision Benchmark by Cohere Labs.

Contributors for this task will translate the English prompts into any one of the following languages in which they are fluent/native speakers: Thai, Standard Malay, Filipino/Tagalog, Lao, Khmer, Tamil, Mandarin Chinese, Burmese, Tetun, Bruneian Malay.

We need three (03) contributors per language for this task. Go to this form to contribute.

Task 4: Submit Visual Questions for Multicultural Images

To evaluate the visual understanding of the models we will train, we need to compile a dataset of high-quality questions derived from cultural images from SEA.

Contributors for this task will create original questions related to a given image (e.g., “What sport is played in this image?”) in any of the following languages in which they are fluent/native speakers: Indonesian, Thai, Standard Malay, Filipino/Tagalog, Tamil, Chinese Mandarin, Vietnamese, Burmese, Lao, Khmer, Tetun, Bruneian Malay

Go to this form to contribute.

Task 5: Submit High-Quality Text Prompts for Image Generation

To evaluate the image generation capabilities of the models we will train, we need to compile a dataset of high-quality English-only prompts at three complexity levels.

To ensure the quality of the prompts, we need contributors who are natives or extremely familiar with the cultures of the SEA countries (Indonesia, Singapore, Philippines, Thailand, Malaysia, Vietnam, Brunei, Timor Leste, Cambodia, Laos, Myanmar)

Example prompts for the Indonesian culture:

  • Level 1 (Easy) - “Draw an image of people drinking cendol.”
  • Level 2 (Medium) - “Draw an image of people drinking cendol with durian topping.”
  • Level 3 (Hard) - “Draw an image of people drinking cendol with durian topping while wearing kebaya.”

Go to this form to contribute.

Task 6: Submit High-Quality Text Prompts

(To be opened at a later date)

Contribution Point System for Tasks

Each task has its corresponding point-per-submission, calibrated for task difficulty and its relation to a specific SEA country culture.

We give more weight to submissions (e.g., culturally relevant images for Task 1) related to Brunei, East Timor, Cambodia, Laos, and Myanmar, as they are heavily underrepresented compared to the other countries. We also want contributors from these countries to reach the contribution threshold faster.

Tasks Indonesia, Singapore, Philippines Thailand, Malaysia, Vietnam Brunei, East Timor, Cambodia, Laos, Myanmar
Task 1: Submit a SEA Culturally-Relevant Image 0.5 1 3
Task 2: Review Image-Description Pairs 1 1 1
Task 3: Translate Benchmark Datasets 1 1 1.5
Task 4: Submit Visual Questions for Multicultural Images 1.5 1.5 2
Task 5: Submit High-Quality Text Prompts for Image Generation 1 1 1.5

Remember, reaching 200 or more points will guarantee a certificate of participation and merchandise (t-shirt and keychain), while reaching 300 or more points will earn co-authorship in the final project paper.

The contribution tracking can be viewed here, where it is updated every weekend.

Project Timeline

  • May 8, 2025 - Project public launch. Contributions for Tasks 1, 2, 3, 4, and 5 are open.
  • November 8, 2025 - End of public contributions to Tasks 1, 2, 3, 4, and 5.
  • January 2026 - Prepare models, data, and multiple paper releases.
  • February 2026 - Prepare experiment paper submissions to ACL 2026. Publication of the project paper on the SEACrowd website and arXiv.

Join the Community!

Check out our GitHub page, and join our Discord server. Everyone is welcome to discuss and ask questions there!

FAQs

Any image that reflects an aspect of SEA culture is welcome! This could include food (e.g., eating Nasi Goreng), locations (e.g., Manila's Escolta Street), events (e.g., Lunar New Year festivities), or everyday cultural practices (e.g., eating with your hands). As long as it connects to SEA culture, it's a great fit!

Yes, as long as you took the image and still hold the copyright. All images will be openly licensed under the CC-BY-SA 4.0 license, so please ensure you own full rights to them before submission.

No, phone-quality images are perfectly acceptable as long as they're not blurry or obstructed.

Yes, images taken abroad are welcome if they are culturally relevant.

No, you do not.

Yes, you can select "Other..." as your native language in the submission form.

Coming soon!

No, contributors to the open tasks will not be listed on the ACL 2026 experimental papers. Those papers will primarily highlight model development results. However, if you reach 300 points, you will be credited as a co-author in our final organizational-wide publication, which will summarize the entire SEA-VL and other SEACrowd projects in 2025. This paper will be published on the SEACrowd website and arXiv, recognizing all community contributors who helped build the dataset and evaluation resources.

Join our Discord server, ask us on #sea-vl or #discussion-forum, and we'll be happy to help!
Share: