PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs
Xinchi Qiu, William F. Shen, Yihong Chen, Nicola Cancedda, Pontus Stenetorp, Nicholas D. Lane·June 24, 2024
Summary
The paper introduces PISTOL, a pipeline for benchmarking structural unlearning in large language models. It addresses the need for evaluating interconnected data in unlearning by generating knowledge-graph-like datasets. Experiments with Llama2-7B and Mistral-7B models show that unlearning interconnected and domain-specific data is challenging, with pre-trained models affecting performance. The study highlights the importance of structural unlearning, data topology, and the impact of learning rate on various unlearning methods. It also explores targeted unlearning, data type effects, and the role of pre-trained models in the context of different datasets and evaluation metrics. The findings suggest that future research should focus on more robust and scalable unlearning techniques, as well as the integration of unlearning with other aspects like privacy and federated learning.
Advanced features