After the success of SEA-VL Phase 1, we are proud to launch SEA-VL Phase 2!
We believe it’s high time to create a model that truly understands Southeast Asian culture and language. We want the model to reflect the visual and linguistic richness of the SEA region through diverse contributions: high-quality data curation, annotation, prompting, model training, and evaluation.
Contribution Guide
Earn points for every contribution and unlock rewards:
- 200+ points: Certificate of participation + merchandise (t-shirt and keychain)
- 300+ points: Co-authorship in the final project paper (not the experimental paper)
How to contribute: SEA-VL Phase 2 has five main tasks. Choose any task that aligns with your interests or expertise.
Task 1: Submit a culturally-relevant image with description for SEA
To build the vision-language model, we need a compilation of culturally relevant images representing all 11 Southeast Asian countries.
Any image that reflects an aspect of SEA culture is considered culturally-relevant. This could include
- food and cuisines (e.g., nasi goreng, pho, green curry)
- locations (e.g., Manila’s Escolta Street
- events (e.g., Lunar New Year festivities)
- everyday cultural practices (e.g., eating with your hands).
As long as it connects to SEA culture, it’s a great fit. You may check this resource to see what images are fit for this task.
Go to this form to submit your images.
For bulk image upload, follow this guide!
Task 2: Review Image-Description Pairs
To ensure quality images for the dataset used in training the model, we need local annotators to rate the submitted images from Task 1.
Contributors for this task must first pass this short screening test. Check our annotation guideline here to learn more and to apply for the screening test. You will receive a link to the annotation platform if you pass the screening test.
Task 3: Translate Benchmark Datasets
To ensure the proper evaluation and testing of vision language capabilities, we need help translating prompts from existing vision language model benchmarks, such as the Aya Vision Benchmark by Cohere Labs.
Contributors for this task will translate the English prompts into any one of the following languages in which they are fluent/native speakers: Thai, Standard Malay, Filipino/Tagalog, Lao, Khmer, Tamil, Mandarin Chinese, Burmese, Tetun, Bruneian Malay.
We need three (03) contributors per language for this task. Go to this form to contribute.
Task 4: Submit Visual Questions for Multicultural Images
To evaluate the visual understanding of the models we will train, we need to compile a dataset of high-quality questions derived from cultural images from SEA.
Contributors for this task will create original questions related to a given image (e.g., “What sport is played in this image?”) in any of the following languages in which they are fluent/native speakers: Indonesian, Thai, Standard Malay, Filipino/Tagalog, Tamil, Chinese Mandarin, Vietnamese, Burmese, Lao, Khmer, Tetun, Bruneian Malay
Go to this form to contribute.
Task 5: Submit High-Quality Text Prompts for Image Generation
To evaluate the image generation capabilities of the models we will train, we need to compile a dataset of high-quality English-only prompts at three complexity levels.
To ensure the quality of the prompts, we need contributors who are natives or extremely familiar with the cultures of the SEA countries (Indonesia, Singapore, Philippines, Thailand, Malaysia, Vietnam, Brunei, Timor Leste, Cambodia, Laos, Myanmar)
Example prompts for the Indonesian culture:
- Level 1 (Easy) - “Draw an image of people drinking cendol.”
- Level 2 (Medium) - “Draw an image of people drinking cendol with durian topping.”
- Level 3 (Hard) - “Draw an image of people drinking cendol with durian topping while wearing kebaya.”
Go to this form to contribute.
Task 6: Submit High-Quality Text Prompts
(To be opened at a later date)
Contribution Point System for Tasks
Each task has its corresponding point-per-submission, calibrated for task difficulty and its relation to a specific SEA country culture.
We give more weight to submissions (e.g., culturally relevant images for Task 1) related to Brunei, East Timor, Cambodia, Laos, and Myanmar, as they are heavily underrepresented compared to the other countries. We also want contributors from these countries to reach the contribution threshold faster.
| Tasks | Indonesia, Singapore, Philippines | Thailand, Malaysia, Vietnam | Brunei, East Timor, Cambodia, Laos, Myanmar |
|---|---|---|---|
| Task 1: Submit a SEA Culturally-Relevant Image | 0.5 | 1 | 3 |
| Task 2: Review Image-Description Pairs | 1 | 1 | 1 |
| Task 3: Translate Benchmark Datasets | 1 | 1 | 1.5 |
| Task 4: Submit Visual Questions for Multicultural Images | 1.5 | 1.5 | 2 |
| Task 5: Submit High-Quality Text Prompts for Image Generation | 1 | 1 | 1.5 |
Remember, reaching 200 or more points will guarantee a certificate of participation and merchandise (t-shirt and keychain), while reaching 300 or more points will earn co-authorship in the final project paper.
The contribution tracking can be viewed here, where it is updated every weekend.
Project Timeline
- May 8, 2025 - Project public launch. Contributions for Tasks 1, 2, 3, 4, and 5 are open.
- November 8, 2025 - End of public contributions to Tasks 1, 2, 3, 4, and 5.
- January 2026 - Prepare models, data, and multiple paper releases.
- February 2026 - Prepare experiment paper submissions to ACL 2026. Publication of the project paper on the SEACrowd website and arXiv.
Join the Community!
Check out our GitHub page, and join our Discord server. Everyone is welcome to discuss and ask questions there!