Multilingual Agentic Benchmarking for Underrepresented Regions

Build an environment and evaluation benchmark for agentic LLMs in low-resource languages and underrepresented regions.

Mentees (5):
Aulia AdilaKittiphat LeesombatwathanaMy (Chiffon) NguyenSaksorn RuangtanusakVissuta Gunawan Lim

Project Proposal

In this work, we address the gap in enabling LLMs with agentic capabilities for low-resource languages and underrepresented regions. Most existing environments and evaluation benchmarks (e.g., Taubench) are Anglocentric, leaving a critical void in assessing performance across diverse linguistic contexts.

This initiative aims to develop a comprehensive environment and evaluation benchmark that measures agentic LLM effectiveness in underrepresented regions, ensuring these technologies are inclusive and accessible to a broader global audience.

Relevant publications:

  • τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
  • τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment