Multilingual Agentic Benchmarking for Underrepresented Regions
Build an environment and evaluation benchmark for agentic LLMs in low-resource languages and underrepresented regions.
Mentees (5):
Aulia AdilaKittiphat LeesombatwathanaMy (Chiffon) NguyenSaksorn RuangtanusakVissuta Gunawan Lim
Project Proposal
In this work, we address the gap in enabling LLMs with agentic capabilities for low-resource languages and underrepresented regions. Most existing environments and evaluation benchmarks (e.g., Taubench) are Anglocentric, leaving a critical void in assessing performance across diverse linguistic contexts.
This initiative aims to develop a comprehensive environment and evaluation benchmark that measures agentic LLM effectiveness in underrepresented regions, ensuring these technologies are inclusive and accessible to a broader global audience.
Relevant publications:
- τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
- τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment