Hello! I am a fifth-year PhD student in computer science at the University of Washington, advised by Yejin Choi and Noah Smith. My research area is natural language processing, with interests particularly in decoding-time algorithms and data creation. I am grateful to be supported by the NSF Graduate Research Fellowship and OpenAI SuperAlignment Fellowship.

Previously I was an undergraduate at Northwestern University where I majored in computer science and math. There, I was very fortunate to learn about research from Professor Doug Downey, Professor Bryan Pardo, and Dr. Prem Seetharaman.

Education
  • PhD student, 2020 - present

    University of Washington

  • BA in Computer Science, Mathematics, 2020

    Northwestern University

Publications

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?.
Preprint.

Paper BibTeX Code

Tuning Language Models by Proxy.
COLM 2024 (Spotlight 🌟, top 7%).

Paper BibTeX Code

We're Afraid Language Models Aren't Modeling Ambiguity.
EMNLP 2023.

Paper BibTeX Code Dataset

That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?.
EMNLP Findings 2023.

Paper BibTeX Code

Inverse Scaling: When Bigger Isn't Better.
TMLR 2023 (Featured 🌟).

Paper BibTeX Code

How Language Model Hallucinations Can Snowball.
ICML 2024.

Paper BibTeX Code

Self-Instruct: Aligning Language Models with Self-Generated Instructions.
ACL 2023.

Paper BibTeX Code

Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts.
ACL 2023.

Paper BibTeX

Generated Knowledge Prompting for Commonsense Reasoning.
ACL 2022.

Paper BibTeX Code

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts.
ACL 2021.

Paper BibTeX Code Slides News