SEACrowd 2026 Apprentice Program

The application for 2026 Apprentice Program was closed on Dec 17, 2025 at 23:59 (Anywhere on Earth, UTC-12).

If you would like to be notified of future apprentice batches, please join our Google Group or follow us on X/Twitter, Facebook, or LinkedIn.

SEACrowd Apprentice Program is a remote research program (Feb 1 - Jun 30, 2026) that pairs experienced researchers with early-career researchers across Southeast Asia to build models, datasets, and publishable research. Small, mentored teams work on scoped projects with structured milestones and community support, creating a clear path toward PhD admissions, jobs, and stronger SEA regional capacity.

Each mentee will join a team of 2–3 mentors (at least 1 primary mentor and 1 secondary mentor) and 2+ fellow mentees to execute a research project over four months. The program emphasizes hands-on experience, mentorship, and peer learning, culminating in a research report, which can turn into paper submission to top AI/ML/NLP conferences as arranged by the team.

2025 Success Story

Our 2025 cohort featured three apprentice teams who successfully completed their projects, culminating in mentees’ first-authored research papers published at the 5th Multilingual Representation Learning Workshop (2025). Check out their publications!

We shared their journey and our learnings from running the first cohort in our Retrospective blog post.

2026 Research Topics

We offer five cutting-edge research projects:

Multilingual Agentic for Underrepresented Regions - Developing evaluation frameworks for agent-based language models in low-resource languages
CoRaL: Contextual Relevance and Linguistic Enrichment - Multi-dimensional data curation framework for low-resource language training corpora
Reasoning Agentic LLM Router - Skill-based routing mechanisms to reduce inference costs while maintaining performance
Selective Memory Layer Finetuning - Architectural solutions for continual learning using memory layers
Knowledge Distillation in Multilingual Vision-Text Models - Creating compact vision-language embeddings for edge devices

View detailed project topics →

Who Can Apply

There are no formal eligibility or age limits. We’re a growth-first programme and value potential, motivation, and effort more than credentials.

We welcome anyone who meets at least one of the following:

Your affiliation (e.g., school, organization, company) is from Southeast Asia (SEA).
You were born and raised in SEA (living there more than 10 years).
You are doing or have done SEA-related research.

and share our vision.

What Increases Your Chances

Bachelor’s degree with a publication or Master’s degree.
Clear AI research goals (pre-PhD programs, early-year PhD, or prior collaboration with mentors).
Fit with project topics, capability, motivation, and mentors (assessed via application + interview).

What You’ll Gain

Certificate of achievement upon completion
Letter of recommendation for PhD/job applications (for strong contributors)
Potential publication at top ML/AI/NLP venues (first authorship reserved for project leads)
Mentorship from experienced AI researchers
Peer network with similar research interests
Hands-on experience in collaborative AI research

Check out previous batch publications!

Application Process

General Application: Nov 17 - Dec 17, 2025 (23:59 UTC-12).
Selection: mid-Dec 2025 - Jan 19, 2026
- Round 1: Application screening
- Round 2: Online interview
Team announcement: mid-Jan 2026

Apply Here

Program Schedule

Kickoff: Feb 1, 2026
Mid-term milestone: End of Mar / early Apr (internal review)
End-term milestone: End of Jun (SEACrowd-wide + external committee)

Publications are encouraged when ready, not tied to specific conference deadlines.

Who You’ll Work With

Primary Mentors

Alham Fikri Aji

MBZUAI

Assistant Professor

Samuel Cahyawijaya

Cohere

Member of Technical Staff

Ekapol Chuangsuwanich

Chulalongkorn University

Fajri Koto

MBZUAI

Assistant Professor

Peerat Limkonchotiwat

AI Singapore

Research Fellow

Genta Indra Winata

Capital One AI Foundation,

Senior Applied Scientist

Secondary Mentors

Farid Adilazuarda

University of Edinburgh

PhD Student

M. Dehan Al-Kautsar

MBZUAI

NLP Researcher

David Anugraha

Stanford University

MSc Student

Patomporn Payoungkhamdee

VISTEC

PhD Student

M. Reza Qorib

NUS

Research Fellow

Pume Tuchinda

VISTEC

Research Assistant

Organizers & Research Managers

Aye Hninn Khine

KMUTT, SEACrowd

Holy Lovenia

SEACrowd

FAQs

Applications: Nov 17 – Dec 17, 2025 (23:59 UTC-12)
Selection: mid-Dec 2025 – Jan 19, 2026 (screening + interview)
Team announcement: mid-Jan 2026
Program dates: Feb 1 – Jun 30, 2026

We list the official milestones above. Since the program is remote and part-time, we expect conflicts to be manageable. Any major conflicts should be disclosed in your application, or smoothed out with your team if you’re selected.

There are no formal eligibility or age limits. You can apply if you meet at least one of the following:

Your affiliation (school, organization, or company) is from Southeast Asia (SEA)
You were born and raised in SEA (living there more than 10 years)
You are doing or have done SEA-related research

You can still qualify if your work connects to Southeast Asia. One clear way is to do research related to Southeast Asia, particularly in Machine Learning or Natural Language Processing.

Examples include work on SEA languages, regional datasets, or SEA-specific social or cultural AI challenges.

This alone does not qualify you for the program.

However, speaking a language from Southeast Asia can help, especially if it informs your research interests. We encourage you to highlight any relevant language skills and how they connect to your research goals in your application.

No, applications are individual. However, during the selection process, we may consider team dynamics and try to group mentees with complementary skills and interests.

Prior research experience or publications are not required but can strengthen your application. We value strong foundational knowledge in ML, potential, motivation, and fit with project topics and mentors more than credentials. If you have relevant experience, be sure to highlight it in your application.

We expect mentees to commit at least 10 hours per week (hard minimum). This includes time for meetings, research, coding, writing, and collaboration with mentors and peers. The recommended commitment is 15–20 hours per week for a more immersive experience.

We understand that mentees may have other commitments. The program is designed to be flexible and part-time. We recommend discussing your availability and commitments with your mentors and team members to ensure a manageable workload.

However, we cannot accommodate mentees who cannot commit the minimum required time.

The SEACrowd Apprentice Program is a remote research program.

Yes, the program is completely free to join for all selected mentees. We provide compute necessary for research projects.

The program is unpaid. However, mentees gain valuable research experience, mentorship, networking opportunities, and potential publication avenues.

Unfortunately, we do not have funds to provide stipends or financial support at this time. We recommend seeking external funding sources, scholarships, or institutional support if needed. Otherwise, we welcome you to apply in future cohorts when you have the necessary resources or when we may have funding available.

We use English as the primary language for all program communications, documentation, and deliverables. Communication channels will be team dependent, but we expect teams to primarily use Discord for day-to-day communication, and Google Meet / Zoom / Microsoft Teams for meetings. How frequent meetings are will be up to each team to decide, but likely weekly or bi-weekly check-ins with mentors.

No, mentees can only be accepted to work on one project per cohort to ensure focus and commitment.

Once accepted, mentees are expected to commit to their assigned project for the duration of the program. Switching projects is generally not allowed, as it can disrupt team dynamics and project continuity. If you have significant concerns, please discuss them with the program organizers.

For the 2026 cohort, we are only accepting applications for the predefined research topics listed on the program page. You can suggest ideas for the topic that you’ve chosen, but you cannot propose an entirely new project. The topics typically come from our mentors and organizers based on research gaps in Southeast Asia and their expertise, so we can ensure quality mentorship and project scope.

However, we encourage you to suggest new project ideas for future cohorts by reaching out to us via email at seacrowd.research@gmail.com. You can also reach out to other mentors & collaborators in research communities like Cohere Labs Open Science Community, Eleuther AI, Masakhane, or Nous Research.

While we encourage and support publication efforts, we cannot guarantee that every project will result in a publication. Successful publication depends on various factors, including the quality of the research, relevance to conference/journal themes, and acceptance by peer reviewers.

Yes, all mentees who complete the program will receive a Certificate of Achievement.

Strong contributors may receive letters of recommendation from their mentors upon successful completion of the program. This is typically reserved for mentees who demonstrate significant effort, growth, and contribution to their projects.

We expect you to retain your commitment to the project, but we understand that unforeseen circumstances may arise. If you need to leave the program early, please inform your mentors and the program organizers as soon as possible. We encourage open communication to manage expectations and ensure a smooth transition for your team.

No. Author list is agreed upon the beginning of the project. Anyone who doesn’t contribute to the paper will not be on the author list.

If you’re not selected for this cohort, we encourage you to apply again in future cohorts. We also recommend joining our Google Group and following us on X/Twitter, Facebook, or LinkedIn to stay updated on future opportunities.

Email us at seacrowd.research@gmail.com or join our Discord and ask in the #apprentice-program channel.

Past Apprentice Research

Projects

Multilingual Agentic for Underrepresented Regions

Build an environment and evaluation benchmark for agentic LLMs in low-resource languages and underrepresented regions.

CoRaL: Contextual Relevance and Linguistic Enrichment

A multi-dimensional data curation framework to balance quality, relevance, and cultural coverage in low-resource corpora.

Reasoning Agentic LLM Router

Develop skill-based routing to reduce inference costs while preserving strong generalization.

Selective Memory Layer Finetuning

Explore memory-layer finetuning strategies to improve continual learning without catastrophic forgetting.

Knowledge Distillation in Multilingual Vision-Text Model

Distill compact multilingual vision-text embeddings from large multimodal teachers for real-world deployment.

Publications

Projects

Language Surgery in Multilingual Large Language Models

Mentors: Samuel Cahyawijaya, Genta Indra Winata, Fajri Koto, Peerat Limkonchotiwat, Alham Fikri Aji

A technique for controlling language use in multilingual LLMs without retraining.

Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations

Mentors: Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya

Learning language embeddings from model entropy to recover typological structure.

SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages

Mentors: Genta Indra Winata, Ekapol Chuangsuwanich, Alham Fikri Aji, Fajri Koto, Peerat Limkonchotiwat

A culturally grounded dialogue dataset and benchmark for SEA languages.

Publications

Language Surgery in Multilingual Large Language Models

Joanito Agili Lopo, Muhammad Ravi Shulthan Habibi, Tack Hwa Wong, Muhammad Ilham Ghozali, Fajri Koto, Genta Indra Winata, Peerat Limkonchotiwat, Alham Fikri Aji, Samuel Cahyawijaya

Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025) • 2025

Abstract

Large Language Models (LLMs) have demonstrated remarkable generalization capabilities across tasks and languages, revolutionizing natural language processing. This paper investigates the naturally emerging representation alignment in LLMs, particularly in the middle layers, and its implications for disentangling language-specific and language-agnostic information. We empirically confirm the existence of this alignment, analyze its behavior in comparison to explicitly designed alignment models, and demonstrate its potential for language-specific manipulation without semantic degradation. Building on these findings, we propose Inference-Time Language Control (ITLC), a novel method that leverages latent injection to enable precise cross-lingual language control and mitigate language confusion in LLMs. Our experiments highlight ITLC’s strong cross-lingual control capabilities while preserving semantic integrity in target languages. Furthermore, we demonstrate its effectiveness in alleviating the cross-lingual language confusion problem, which persists even in current large-scale LLMs, leading to inconsistent language generation. This work advances our understanding of representation alignment in LLMs and introduces a practical solution for enhancing their monolingual and cross-lingual performance.

DOI
Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations

Patrick Amadeus Irawan, Ryandito Diandaru, Belati Jagad Bintang Syuhada, Randy Zakya Suchrady, Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya

Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025) • 2025

Abstract

We introduce Entropy2Vec, a novel framework for deriving cross-lingual language representations by leveraging the entropy of monolingual language models. Unlike traditional typological inventories that suffer from feature sparsity and static snapshots, Entropy2Vec uses the inherent uncertainty in language models to capture typological relationships between languages. By training a language model on a single language, we hypothesize that the entropy of its predictions reflects its structural similarity to other languages: Low entropy indicates high similarity, while high entropy suggests greater divergence. This approach yields dense, non-sparse language embeddings that are adaptable to different timeframes and free from missing values. Empirical evaluations demonstrate that Entropy2Vec embeddings align with established typological categories and achieved competitive performance in downstream multilingual NLP tasks, such as those addressed by the LinguAlchemy framework.

DOI
SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages

Muhammad Dehan Al Kautsar, Aswin Candra, Muhammad Alif Al Hakim, Maxalmina Satria Kahfi, Fajri Koto, Alham Fikri Aji, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Genta Indra Winata

Preprint • 2025

Abstract

Although numerous datasets have been developed to support dialogue systems, most existing chit-chat datasets overlook the cultural nuances inherent in natural human conversations. To address this gap, we introduce SEADialogues, a culturally grounded dialogue dataset centered on Southeast Asia, a region with over 700 million people and immense cultural diversity. Our dataset features dialogues in eight languages from six Southeast Asian countries, many of which are low-resource despite having sizable speaker populations. To enhance cultural relevance and personalization, each dialogue includes persona attributes and two culturally grounded topics that reflect everyday life in the respective communities. Furthermore, we release a multi-turn dialogue dataset to advance research on culturally aware and human-centric large language models, including conversational dialogue agents.

DOI PDF Resources

2025 Success Story

2026 Research Topics

Who Can Apply

What Increases Your Chances

What You’ll Gain

Application Process

Program Schedule

Who You’ll Work With

Primary Mentors

Secondary Mentors

Organizers & Research Managers

FAQs

What are the key dates for the 2026 Apprentice Program?

Are the key dates flexible?

Who is eligible to apply?

I wasn't born in Southeast Asia and am not working/studying in Southeast Asia, how do I qualify?

Am I qualified if I speak a language spoken in Southeast Asia, like Thai/Hokkien Chinese/etc?

Can I apply as a group/team?

What if I have no prior research experience or publications in AI/ML/NLP?

Is this program full-time? / What's the time commitment?

What if I have other commitments (school, work, etc)?

Is the program remote or in-person?

Is this program free?

Do I get paid? / Is there a stipend?

What if I need financial support to participate?

How to communicate during the program?

Can I get accepted to work on multiple projects?

What if I want to switch projects after being accepted?

Can I propose my own project?

Is research publication guaranteed?

Do you provide certificates of completion?

Can I get a letter of recommendation?

What if I need to leave the program early?

Can I add someone else's name to my paper? / Can I look for external collaborators?

What if I'm not selected this time?

I have a question not covered here

Past Apprentice Research

Projects

Multilingual Agentic for Underrepresented Regions

CoRaL: Contextual Relevance and Linguistic Enrichment

Reasoning Agentic LLM Router

Selective Memory Layer Finetuning

Knowledge Distillation in Multilingual Vision-Text Model

Publications

Projects

Language Surgery in Multilingual Large Language Models

Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations

SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages

Publications