About Me
I’m a senior research scientist at Cohere for Labs, where I conduct research on large language models, centered around multilinguality, reinforcement learning, and evaluation. Previously I worked at Google Research, Montreal, with a focus on machine translation. Very broadly speaking, I am interested in the intersection of natural language processing (NLP) and machine learning, especially where multiple languages come into play.
Previously, in my PhD at Heidelberg University (StatNLP group), I investigated how reinforcement learning algorithms can be used to turn weak supervision signals from users into meaningful updates for a machine translation system (=RLHF before it was cool).
🎯 My long-term goal for NLP research is to make it more accessible, along multiple dimensions:
- Underresourced NLP: Foster research for underresourced languages and by underrepresented groups, such that not only English-speaking users can benefit from the progress we’re making in NLP.
- Novices: Reduce the entry burdens (in terms of coding and research practices) for novices in the field, especially for new students or researchers from other related areas.
- Science outreach: Get the general public more interested in research in machine learning to grow a better understanding of what our current methods look like and where their limitations are.
👨👩👧👦 I am mom of a two toddlers, so if you’d like to connect to chat about balancing family and research, I don’t have much advice but lots of experience to share, and I am motivated to make research a more supportive place for young families.
⏳ Last updated: 15 August 2025. If there’s no recent news below, it probably means I was busy doing more important things.
News
- The evaluation saga continues: I had the honor to co-author another blog post with Singapore AI on LLM evaluation with Elo scores.
- Applications for our Cohere Labs scholar program are open until 29 August. Come explore unknown with us and dive deep into research!
- COLM: I’m DEI Co-Chair at COLM 2025. We provide financial assistance, free registration through a volunteering program, and childcare support. Applications for these programs are open until July 31 - visit the conference website for the application forms.
- Paper on multilingual LLM evaluation practices accepted at COLM (“Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation”), a fun collaboration with colleagues from Cohere and former colleagues from Google. We compile a checklist to guide multilingual LLM evaluations, and release the paper’s LLM-as-a-judge evaluations for better transparency. Check out the LLM Journal Club Talk about this paper and related evaluation discussions.
- Preprint on test-time scaling of multilingual LLMs released, led by Cohere Labs’ scholar Ammar Khairi: “When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs”. Taking the perspective of making the most of little compute investments, we propose new sampling and selection strategies for parallel scaling to better handle variance in heterogenous test-time applications from different languages, tasks and domains.
- Preprint on training with data markers released, joint work with colleagues from Cohere and led by Daniel D’souza: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers”. We show that when you tag fine-tuning data with meta-information, it gives you a powerful lever at inference time, e.g. to improve performance on long-tail examples.
- Two preprints on multilingual safety in LLMs released:
- A policy primer on the language gap in LLM safety “The Multilingual Divide and Its Impact on Global AI Safety”, co-authored with colleagues from Cohere.
- A survey analyzing the research landscape of safety research beyond English “The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It”, led by Yong Zheng. We find that there are substantial gaps between English safety research and safety research for other languages. Check out the paper for ideas how to close this gap.
- Preprint on crosslingual reasoning released, led by Yong Zheng, Farid Adilazuarda, Jonibek Mansurov, Ruochen Zhang: “Crosslingual Reasoning through Test-Time Scaling”
. It turns out English-only reasoning finetuning, in combination with test-time scaling can give surprising benefits for crosslingual applications, but less so on the long tail of languages and domains.
- Check out our blog post on fair and comprehensive multilingual LLM evaluation practices, a collaboration with AI Singapore.
Publications
Google scholar
Email: <lowercase first + last name>.@cohere.com