Amidst the prevalent focus on refusing “unsafe” queries in research studies, we came across an intriguing white paper, The Art of Saying No: Contextual Noncompliance in Language Models, that delves deep into the development of a detailed taxonomy of contextual noncompliance. It encompasses various categories like incomplete, unsupported, and humanizing requests. The paper also delves into diverse training strategies, highlighting the efficacy of employing low-rank adapters in achieving a harmonious balance between appropriate noncompliance and overall capabilities.
Exploring the “Acceptable and Unacceptable Model Behaviors” chart has provided valuable insights, offering a compelling framework to contemplate diverse strategies for mitigating GenAI Risks.
Leave a Reply