LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts

Henrique Da Silva Gameiro, Andrei Kucharavy, Ljiljana Dolamic·September 05, 2024

Summary

The text discusses the challenges in detecting disinformation created by large language models (LLMs), focusing on the limitations of existing detectors. It highlights the need for domain-specific benchmarking and evaluation of detectors' ability to resist adversarial attacks and overfitting to human text. A proposed benchmark aims to address these issues by dynamically expanding to cover new threats. The study evaluates various detectors, including BERT-based models like RoBERTa and Electra, and open-source detectors such as Fast-DetectGPT and GPTZero, on datasets generated by different LLMs. It finds that detectors generally perform well on fake news articles generated by different LLMs than those used for training, indicating good generalization. However, zero-shot detectors like Fast-DetectGPT and GPTZero underperform compared to trained detectors, especially on texts generated by Mistral. The study emphasizes the importance of diverse testing and exploring a wide range of prompting strategies to improve detector effectiveness. It also discusses the limitations of current benchmarks in accurately predicting real-world performance and advocates for dynamic benchmarks that can adapt to new threats and domains.

Key findings

7
  • header
  • header
  • header
  • header
  • header
  • header
  • header

Tables

1

Introduction
Background
Overview of large language models (LLMs)
Importance of detecting disinformation in LLM-generated content
Objective
Addressing limitations in existing disinformation detection methods
Highlighting the need for domain-specific benchmarking and evaluation
Method
Data Collection
Datasets generated by different LLMs
Selection criteria for datasets
Data Preprocessing
Methods for preparing data for detection models
Handling biases and ensuring representativeness
Benchmarking and Evaluation
Dynamic Benchmarking
Concept and importance
Strategies for expanding benchmarks to cover new threats
Evaluation Metrics
Metrics used for assessing detector performance
Comparison of detectors on various datasets
Detector Evaluation
BERT-based Models
RoBERTa and Electra
Performance on different LLM-generated datasets
Open-Source Detectors
Fast-DetectGPT and GPTZero
Evaluation results and comparative analysis
Findings
Generalization Performance
Detectors' ability to generalize across different LLMs
Performance on fake news articles
Detector Effectiveness
Comparison of trained vs. zero-shot detectors
Performance on Mistral-generated texts
Limitations and Future Directions
Benchmarking Limitations
Challenges in accurately predicting real-world performance
Importance of diverse testing
Dynamic Benchmarking
Advantages and implementation considerations
Role in adapting to new threats and domains
Conclusion
Summary of key findings
Implications for improving disinformation detection methods
Recommendations for future research
Basic info
papers
cryptography and security
computation and language
machine learning
artificial intelligence
Advanced features
Insights
What are some of the detectors evaluated in the study, and how do they perform on datasets generated by different large language models?
Which proposed benchmark is aimed at addressing the limitations of existing detectors in resisting adversarial attacks and overfitting to human text?
What is the main focus of the text regarding the challenges in detecting disinformation created by large language models?