natural language inference

We're Afraid Language Models Aren't Modeling Ambiguity
We build a benchmark to evaluate LM understanding of ambiguity, which is an intrinsic feature of language, and find that the task remains extremely challenging, including for GPT-4
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
We introduce a paradigm for dataset creation based on human and machine collaboration, and demonstrate its empirical effectiveness for collecting a new large-scale NLI dataset