About Me
I’m a senior research scientist at Cohere for Labs, where I conduct research on large language models, centered around multilinguality, reinforcement learning, and evaluation. Previously I worked at Google Research, Montreal, with a focus on machine translation. Very broadly speaking, I am interested in the intersection of natural language processing (NLP) and machine learning, especially where multiple languages come into play.
Previously, in my PhD at Heidelberg University (StatNLP group), I investigated how reinforcement learning algorithms can be used to turn weak supervision signals from users into meaningful updates for a machine translation system (=RLHF before it was cool).
🎯My long-term goal for NLP research is to make it more accessible, along multiple dimensions:
- Underresourced NLP: Foster research for underresourced languages and by underrepresented groups, such that not only English-speaking users can benefit from the progress we’re making in NLP.
- Novices: Reduce the entry burdens (in terms of coding and research practices) for novices in the field, especially for new students or researchers from other related areas.
- Science outreach: Get the general public more interested in research in machine learning to grow a better understanding of what our current methods look like and where their limitations are.
👨👩👧👦I am also the mom of a two toddlers, so if you’d like to connect to chat about balancing family and research, I don’t have much advice but lots of experience to share, and I am motivated to make research a more supportive place for young families.
⏳ Last updated: 23 July 2025. If there’s no recent news below, it probably means I was busy doing more important things.
News
- COLM: I’m DEI Co-Chair at COLM 2025. We provide financial assistance, free registration through a volunteering program, and childcare support. Applications for these programs are open until July 31 - visit the conference website for the application forms.
- Paper on multilingual LLM evaluation practices accepted at COLM (“Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation”), a fun collaboration with colleagues from Cohere and former colleagues from Google. We compile a checklist to guide multilingual LLM evaluations, and release the paper’s LLM-as-a-judge evaluations for better transparency. Check out the LLM Club Talk about this paper and related evaluation discussions.
- Preprint on test-time scaling of multilingual LLMs released, led by Cohere Labs’ scholar Ammar Khairi: “When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs”. Taking the perspective of making the most of little compute investments, we propose new sampling and selection strategies for parallel scaling to better handle variance in heterogenous test-time applications from different languages, tasks and domains.
- Preprint on training with data markers released, joint work with colleagues from Cohere and led by Daniel D’souza: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers”. We show that when you tag fine-tuning data with meta-information, it gives you a powerful lever at inference time, e.g. to improve performance on long-tail examples.
- Two preprints on multilingual safety in LLMs released:
- A policy primer on the language gap in LLM safety “The Multilingual Divide and Its Impact on Global AI Safety”, co-authored with colleagues from Cohere.
- A survey analyzing the research landscape of safety research beyond English “The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It”, led by Yong Zheng. We find that there are substantial gaps between English safety research and safety research for other languages. Check out the paper for ideas how to close this gap.
- Preprint on crosslingual reasoning released, led by Yong Zheng, Farid Adilazuarda, Jonibek Mansurov, Ruochen Zhang: “Crosslingual Reasoning through Test-Time Scaling”
. It turns out English-only reasoning finetuning, in combination with test-time scaling can give surprising benefits for crosslingual applications, but less so on the long tail of languages and domains.
- Check out our blog post on fair and comprehensive multilingual LLM evaluation practices, a collaboration with AI Singapore.
<!–
- Oct 2024: Back at work after parental leave 👶
- EMNLP 2024: Three scholar-led projects were accepted at EMNLP! Couldn’t be more proud of their achievements, it was an honor mentoring them.
- RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs led by John Dang. What does it take to make preference training multilingual, and how multilingual does it have to be?
- LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives led by Luísa Shimabucoro. Which properties do models inherit from their teachers, and can we steer this inheritance?
- The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm led by Aakanksha. How do we dinstinguish local vs global relevance for model safety, and how do we make models safer for both?
- ACL 2024: Two papers accepted at ACL.
- “Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs” led by Arash Ahmadian. Do we really need PPO?
- Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning led by Everlyn Chimoto. What do checkpoint comparisons tell us about data importance?
- May 2024: We released Aya23, a multilingual model from the Aya family covering 23 languages. It comes in two sizes (8B and 35B) and outperforms Aya101 and similar competitors. All details in our tech report.
- Feb 2024: Giving a guest lecture on the Aya project in Siva Reddy’s class on Natural Language Understanding with Deep Learning / Computational Semantics at McGill. Slides available upon request.
- Feb 2024: New preprint about RLHF: “Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs”. This work led by Cohere for AI scholar Arash Ahmadian scrutinizes the popular PPO algorithm for RLHF in LLMs, and presents effective but simpler alternatives that are grounded in the classic (and basic!) REINFORCE algorithm. Throwback to my PhD topic :)
- Feb 2024: Project Aya released its Aya101 model and data! Detailed documentation can be found in the preprints (model, data). This work is the result of a massive open-science collaboration, aiming to build a massively multilingual instruction fine-tuned large language model. My own contributions focus on testing the model for bias, toxicity and harm, and on conducting and comparing human and automatic evaluation of open-ended generation quality.
–>
Publications
Google scholar
Email: <lowercase first + last name>.@cohere.com