Alisa Liu

Hello! I am a final-year PhD student in computer science at the University of Washington, advised by Yejin Choi and Noah Smith. My research aims to build better algorithms for language models, including for tokenization, data creation, and inference-time adaptation. I am grateful to be supported by the NSF Graduate Research Fellowship and OpenAI SuperAlignment Fellowship.

I am on the industry job market for 2026! Please reach out if you think my background and experience could be a good fit. :)

Selected Publications

Topic:

Are You Going to Finish That? A Practical Study of the Partial Token Problem

Hao Xu, Alisa Liu, Jonathan Hayase, Yejin Choi, Noah A. Smith

Preprint 2026

Sampling from Your Language Model One Byte at a Time

Jonathan Hayase, Alisa Liu, Noah A. Smith, Sewoong Oh

Preprint 2025

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations

Brian Siyuan Zheng, Alisa Liu, Orevaoghene Ahia, Jonathan Hayase, Yejin Choi, Noah A. Smith

NeurIPS 2025 — Spotlight ๐ŸŒŸ

SuperBPE: Space Travel for Language Models

Alisa Liu*, Jonathan Hayase*, Sewoong Oh, Noah A. Smith, Yejin Choi

COLM 2025

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, Hannaneh Hajishirzi

COLM 2025

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Jonathan Hayase*, Alisa Liu*, Yejin Choi, Sewoong Oh, Noah A. Smith

NeurIPS 2024

How Language Model Hallucinations Can Snowball

Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith

ICML 2024

Tuning Language Models by Proxy

Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith

COLM 2024 — Spotlight ๐ŸŒŸ, top 7%

We're Afraid Language Models Aren't Modeling Ambiguity

Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamditta, Noah A. Smith, Yejin Choi

EMNLP 2023

That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?

Jaechan Lee, Alisa Liu, Orevaoghene Ahia, Hila Gonen, Noah A. Smith

EMNLP Findings 2023

Inverse Scaling: When Bigger Isn't Better

Ian R. McKenzie, 18 others, Alisa Liu, Jiacheng Liu, Tom Tseng, Tomasz Korbak, Najoung Kim, Samuel R. Bowman, Ethan Perez

TMLR 2023 — Featured ๐ŸŒŸ

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi

ACL 2023

Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts

Skyler Hallinan, Alisa Liu, Yejin Choi, Maarten Sap

ACL 2023

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

Alisa Liu, Swabha Swayamditta, Noah A. Smith, Yejin Choi

EMNLP Findings 2022

Generated Knowledge Prompting for Commonsense Reasoning

Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi

ACL 2022

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamditta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi

ACL 2021

Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation

Alisa Liu, Alex Fang, Gaรซtan Hadjeres, Prem Seetharaman, Bryan Pardo

ML4MD Workshop at ICML 2020