Sampling from Your Language Model One Byte at a Time.
Jonathan Hayase,
Alisa Liu,
Noah A. Smith,
Sewoong Oh.
Preprint 2025.
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations.
Brian Siyuan Zheng,
Alisa Liu,
Orevaoghene Ahia,
Jonathan Hayase,
Yejin Choi,
Noah A. Smith.
NeurIPS 2025 (Spotlight 🌟).
SuperBPE: Space Travel for Language Models.
Alisa Liu*,
Jonathan Hayase*,
Sewoong Oh,
Noah A. Smith,
Yejin Choi.
COLM 2025.
Tulu 3: Pushing Frontiers in Open Language Model Post-Training.
Nathan Lambert,
Jacob Morrison,
Valentina Pyatkin,
Shengyi Huang,
Hamish Ivison,
Faeze Brahman,
Lester James V. Miranda,
Alisa Liu,
Nouha Dziri,
Shane Lyu,
Yuling Gu,
Saumya Malik,
Victoria Graf,
Jena D. Hwang,
Jiangjiang Yang,
Ronan Le Bras,
Oyvind Tafjord,
Chris Wilhelm,
Luca Soldaini,
Noah A. Smith,
Yizhong Wang,
Pradeep Dasigi,
Hannaneh Hajishirzi.
COLM 2025.
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models.
Hila Gonen,
Terra Blevins,
Alisa Liu,
Luke Zettlemoyer,
Noah A. Smith.
NAACL 2025.
LlamaPIE: Proactive In-Ear Conversation Assistants.
Tuochao Chen,
Nicholas Batchelder,
Alisa Liu,
Noah Smith,
Shyamnath Gollakota.
ACL Findings 2025.
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?.
Jonathan Hayase*,
Alisa Liu*,
Yejin Choi,
Sewoong Oh,
Noah A. Smith.
NeurIPS 2024.
Tuning Language Models by Proxy.
Alisa Liu,
Xiaochuang Han,
Yizhong Wang,
Yulia Tsvetkov,
Yejin Choi,
Noah A. Smith.
COLM 2024 (Spotlight 🌟, top 7%).
We're Afraid Language Models Aren't Modeling Ambiguity.
Alisa Liu,
Zhaofeng Wu,
Julian Michael,
Alane Suhr,
Peter West,
Alexander Koller,
Swabha Swayamdipta,
Noah A. Smith,
Yejin Choi.
EMNLP 2023.
That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?.
Jaechan Lee,
Alisa Liu,
Orevaoghene Ahia,
Hila Gonen,
Noah A. Smith.
EMNLP Findings 2023.
Inverse Scaling: When Bigger Isn't Better.
Ian R. McKenzie,
18 others,
Alisa Liu,
Jiacheng Liu,
Tom Tseng,
Tomasz Korbak,
Najoung Kim,
Samuel R. Bowman,
Ethan Perez.
TMLR 2023 (Featured 🌟).
How Language Model Hallucinations Can Snowball.
Muru Zhang,
Ofir Press,
William Merrill,
Alisa Liu,
Noah A. Smith.
ICML 2024.
Self-Instruct: Aligning Language Models with Self-Generated Instructions.
Yizhong Wang,
Yeganeh Kordi,
Swaroop Mishra,
Alisa Liu,
Noah A. Smith,
Daniel Khashabi,
Hannaneh Hajishirzi.
ACL 2023.
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts.
Skyler Hallinan,
Alisa Liu,
Yejin Choi,
Maarten Sap.
ACL 2023.
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation.
Alisa Liu,
Swabha Swayamdipta,
Noah A. Smith,
Yejin Choi.
EMNLP Findings 2022.
Generated Knowledge Prompting for Commonsense Reasoning.
Jiacheng Liu,
Alisa Liu,
Ximing Lu,
Sean Welleck,
Peter West,
Ronan Le Bras,
Yejin Choi,
Hannaneh Hajishirzi.
ACL 2022.
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts.
Alisa Liu,
Maarten Sap,
Ximing Lu,
Swabha Swayamdipta,
Chandra Bhagavatula,
Noah A. Smith,
Yejin Choi.
ACL 2021.
Model Selection for Deep Audio Source Separation via Clustering Analysis.
Alisa Liu,
Prem Seetharaman,
Bryan Pardo.
DCASE 2020 (
Best Student Paper Award).
Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation.
Alisa Liu,
Alex Fang,
Gaëtan Hadjeres,
Prem Seetharaman,
Bryan Pardo.
ML4MD Workshop at ICML 2020.
Bach or Mock? A Grading Function for Chorales in the Style of J.S. Bach.
Alex Fang,
Alisa Liu,
Prem Seetharaman,
Bryan Pardo.
ML4MD Workshop at ICML 2020.
CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense.
Michael Chen,
Mike D’Arcy,
Alisa Liu,
Jared Fernandez,
Doug Downey.
RepEval Workshop at NAACL 2019.