natural language inference

We're Afraid Language Models Aren't Modeling Ambiguity

We build a benchmark to evaluate LM understanding of ambiguity, which is an intrinsic feature of language, and find that the task remains extremely challenging, including for GPT-4

Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

We introduce a paradigm for dataset creation based on human and machine collaboration, and demonstrate its empirical effectiveness for collecting a new large-scale NLI dataset

Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi