New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook

Meng Yang, Tianqing Zhu, Chi Liu, WanLei Zhou, Shui Yu, Philip S. Yu·November 12, 2024

Summary

This survey explores security and privacy risks in pre-trained models, focusing on attacks and defenses. It categorizes issues into No-Change, Input-Change, and Model-Change approaches, offering a taxonomy for understanding unique security challenges. The study aims to highlight new research opportunities in pre-trained model security and privacy. The recent growth in large AI models, like GPT and BERT, has introduced new security issues due to their unique training strategies and large datasets. These models, used for "pre-trained/fine-tune/inference," face attacks not seen in traditional models. Larger models exhibit stronger functionality, raising questions about unique security and privacy issues as size increases. A comprehensive survey is needed to summarize, analyze, and classify these issues, proposing a novel taxonomy for attack and defense methods, and reviewing their effectiveness across different model scales. The text discusses security and privacy threats to pre-trained models, focusing on models like GPT-1, BERT, GPT-2, RoBERTa, and others. It outlines their sizes, base models, and release times, comparing open-source and close-source models. The survey also looks ahead to future models, including GPT-3, PaLM, and LLaMA, highlighting advancements in text and multi-modal applications. The text discusses various attack methods targeting machine learning models, focusing on optimizing adversarial noise for image and multimodal models. These attacks aim to make model outputs incorrect by modifying input samples, exploiting the models' training on similar data distributions. Model-change attacks involve misleading a publicly available model into completing a specific task by editing input, focusing on the fine-tuning stage for easier attack implementation. The text discusses defense methods for pre-trained models against attacks, focusing on input-perturbation and defense-prompt techniques. Defense prompts are unique for large models, helping them think more reliably and correct incorrect outputs. Input-change defenses occur in both pre- and post-attack stages, aiming to mislead attackers or disrupt their carefully crafted malicious samples. The text discusses various defense mechanisms against security and privacy threats in pre-trained models. It outlines various methods such as input-detection, which identifies malicious queries by detecting special triggers, and model-detection, which identifies backdoored models. Techniques include using sensitivity gaps, anomaly detection, and Grad-CAM for image domain defense, and PatchCleanser for adversarial patch defense. The text discusses defense methods for pre-trained models against attacks, focusing on input-perturbation and defense-prompt techniques. Defense prompts are unique for large models, helping them think more reliably and correct incorrect outputs. Input-change defenses occur in both pre- and post-attack stages, aiming to mislead attackers or disrupt their carefully crafted malicious samples. The text discusses various defense mechanisms against security and privacy threats in pre-trained models. It outlines various methods such as input-detection, which identifies malicious queries by detecting special triggers, and model-detection, which identifies backdoored models. Techniques include using sensitivity gaps, anomaly detection, and Grad-CAM for image domain defense, and PatchCleanser for adversarial patch defense.

Key findings

12
  • header
  • header
  • header
  • header
  • header
  • header
  • header
  • header
  • header
  • header
  • header
  • header

Tables

1

Advanced features