Resources

We open-source datasets, dataloaders, models, and other resources from our projects and flagship apprenticeship program. Follow our HuggingFace or Github for updates.

For resources associated with our papers and preprints, please visit our publications page and look for “Resources” link. We also curate papers by our affiliated members that advance our mission of building AI to represent Southeast Asia.

HuggingFace Collections

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark

Collection

NLP Multimodal

Comprehensive collection of Southeast Asian language datasets and benchmarks for NLP tasks.

SEA-VL: Multicultural VL Dataset for Southeast Asia

Collection

Vision-Language Southeast Asia

Vision-language datasets capturing the cultural diversity of Southeast Asian contexts.

View more collections

HuggingFace Resources

Models

Explore our pre-trained models and fine-tuned variants for Southeast Asian languages.

Datasets

Access our comprehensive collection of Southeast Asian language and cultural datasets.

Spaces

Try out interactive demos and applications built with our models and datasets.