SelGrad: Selective Gradient Projection for Efficient Safety Alignment Against Harmful Fine-TuningWei Cheng Chiu, Shao-Jui Wang.
IEEE Transactions on Dependable and Secure Computing (TDSC) — Under Review, 2026.
A selective gradient-projection method for safety alignment that defends large language
models against harmful fine-tuning, preserving alignment while keeping fine-tuning
efficient.