Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks Paper • 2410.18210 • Published Oct 23, 2024
Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published Dec 17, 2025 • 17
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection Paper • 2507.07871 • Published May 11 • 1
Robust Safety Monitoring of Language Models via Activation Watermarking Paper • 2603.23171 • Published Mar 30
When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models Paper • 2606.10740 • Published 19 days ago • 2
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models Paper • 2311.16254 • Published Nov 27, 2023 • 1