toxicity

Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts
Using expert and anti-expert LMs to rewrite toxic text for safety
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Steering open-ended text generation toward desired or away from undesired attributes, using expert and anti-expert language models