Deconstruct Evaluate and Targeted Intervention based LLMs Unlearning Algorithm

Wang Yang; Miao Ke; Hu Yuke; Li Xiaochen; Qin Zhan

doi:10.23919/cje.2025.00.105

Yang Wang, Ke Miao, Yuke Hu, et al., “Deconstruct evaluate and targeted intervention based llms unlearning algorithm,” Chinese Journal of Electronics, vol. x, no. x, pp. 1–10, xxxx. DOI: 10.23919/cje.2025.00.105

Citation:

Deconstruct Evaluate and Targeted Intervention based LLMs Unlearning Algorithm

Abstract

Abstract

With the acceleration of digitalization, the widespread application of Large Language Models (LLMs) in many fields has made data privacy and compliance issues increasingly prominent. Balancing the protection of sensitive information, adherence to regulations, and the maintenance of model performance and utility has become a pressing challenge. Machine unlearning technology is a key solution to this problem. How to maintain the model’s ability to parse entities and relationships while erasing target knowledge has become an urgent technical bottleneck that needs to be overcome. This paper delves into LLM unlearning, addressing the shortcomings of current evaluation systems and algorithms. We propose a novel framework that integrates deconstruction evaluation with targeted intervention. By parsing discrete knowledge points into (subject s, relation r, object o) triples, we construct a dual dimensional evaluation system, the Knowledge Deconstruction-based Unlearning Metric (KDUM). This framework allows for more precise measurement of model performance during specific knowledge unlearning, supporting technical optimization. Our designed Structured Targeted Unlearning Algorithm (STUA) combines targeted token disruption, attention mask replacement, and finetuning. It precisely removes target knowledge while minimizing side effects. Experiments show the proposed evaluation framework accurately assesses specific knowledge unlearning, and STUA outperforms existing algorithms in general knowledge retention and target unlearning degree. Notably, when general knowledge retention is no less than that of retrained models, STUA’s target unlearning degree surpasses existing SOTA methods. It improves by 26% on average with 1% unlearning data, 22% with 5% unlearning data, and 41% with 10% unlearning data.

FullText(HTML)