Oskar van der Wal

Hi!

I currently serve as a technology specialist focusing on AI Safety at the AI Office in the European Commission, where I help shape and enforce policies to ensure the responsible development of artificial intelligence systems.

Previously, I was a PhD candidate at the Institute for Logic, Language and Computation (ILLC) at the University of Amsterdam. My doctoral research investigated the mechanisms behind social biases in language models—examining how these biases manifest, how they can be measured reliably, and grounding these technical discussions in broader societal contexts.

I'm particularly passionate about leveraging interpretability tools to address critical questions like "How can we reliably measure and mitigate bias?" and "How do language models acquire these biases during training?" Through this work, I aim to make AI systems more equitable, transparent, and aligned with human values.

During my doctoral studies, I contributed to research on bias and interpretability through collaborations with EleutherAI and the BigScience initiative to advance the field of responsible AI.

AI Safety · Bias in NLP · Interpretability

Publications

∗ indicates equal contribution

Pre-prints

2022

Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, et al.

Bloom: A 176b-parameter open-access multilingual language model

arXiv, abs/2211.05100

2022

Oskar van der Wal, Jaap Jumelet, Katrin Schulz, and Willem Zuidema.

The birth of bias: A case study on the evolution of gender bias in an english language model

4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), p. 75

Journals

2024

Oskar van der Wal∗, Dominik Bachmann∗, Alina Leidinger, Leendert van Maanen, Willem Zuidema, and Katrin Schulz.

Undesirable biases in NLP: addressing challenges of measurement

Journal of Artificial Intelligence Research

Conferences & Workshops

2025

Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein, Naomi Saphra, Hailey Schoelkopf, Willem Zuidema, and Stella Biderman.

PolyPythias: Stability and outliers across fifty language model pre-training runs

ICLR 2025

2025

Margaret Mitchell, Giuseppe Attanasio, Ioana Baldini, et al., and Oskar van der Wal, Aurélie Névéol, Mike Zhang, Sydney Zink, and Zeerak Talat.

SHADES: Towards a multilingual assessment of stereotypes in large language models

NAACL 2025

2023

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O'Brien, et al., and Oskar van der Wal.

Pythia: A suite for analyzing large language models across training and scaling

ICML 2023, vol. 202, pp. 2397–2430. PMLR

2023

Abhijith Chintam, Rahel Beloch, Willem Zuidema, Michael Hanna∗, and Oskar van der Wal∗.

Identifying and adapting transformer-components responsible for gender bias in an english language model

BlackboxNLP 2023, pp. 379–394

2023

Jaap Jumelet, Michael Hanna, Marianne de Heer Kloots, Anna Langedijk, Charlotte Pouw, and Oskar van der Wal.

Chapgtp, illc's attempt at raising a babylm: Improving data efficiency by automatic task formation

BabyLM Challenge at CoNLL 2023, pp. 74–85

2023

Gabriele Sarti, Nils Feldhus, Ludwig Sickert, and Oskar van der Wal.

Inseq: An interpretability toolkit for sequence generation models

ACL 2023 (System Demonstrations), pp. 421–435, Toronto

2022

Zeerak Talat, Aurélie Névéol, Stella Biderman, et al., and Oskar van der Wal.

You reap what you sow: On the challenges of bias evaluation under multilingual settings

BigScience Workshop at ACL 2022, pp. 26–41

2020

Oskar van der Wal, Silvan de Boer, Elia Bruni, and Dieuwke Hupkes.

The grammar of emergent languages

EMNLP 2020, pp. 3339–3359

Blog posts

Wed
24 Jan
2024

📄 Undesirable Biases in NLP: Addressing Challenges of Measurement

This post is about our paper "Undesirable Biases in NLP: Addressing Challenges of Measurement", published in JAIR. Developing tools for measuring & mitigating bias is challenging: LM bias is a complex sociocultural phenomenon and we have no access to a ground truth. We voice our concerns about current bias evaluation practices…

Sun
01 Jan
2023

Taking a step back and positioning bias: three considerations

In my research, I use various approaches to investigate social bias in language models. When discussing such undesirable biases, we often take a mathematical and 'mechanistic' approach, measuring deviations from a prescriptive norm of ideal behavior (e.g., a skew in gender distribution from 50/50%) or trying to explain…

Thu
14 Jul
2022

📄 The Birth of Bias: A case study on the evolution of gender bias in an English language model

Language models have become essential building blocks of modern AI systems dealing with natural language. These models excel in diverse tasks including sentiment analysis, text generation, translation, and summarization. While their effectiveness stems from neural network architectures trained on vast datasets, this power comes with significant challenges…