We're Afraid Language Models Aren't Modeling Ambiguity
We build a benchmark to evaluate LM understanding of ambiguity, which is an intrinsic feature of language, and find that the task remains extremely challenging, including for GPT-4
Alisa Liu,
Zhaofeng Wu,
Julian Michael,
Alane Suhr,
Peter West,
Alexander Koller,
Swabha Swayamdipta,
Noah A. Smith,
Yejin Choi
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
We introduce a paradigm for dataset creation based on human and machine collaboration, and demonstrate its empirical effectiveness for collecting a new large-scale NLI dataset
Alisa Liu,
Swabha Swayamdipta,
Noah A. Smith,
Yejin Choi