Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang·June 16, 2025

Summary

Poisoning attacks target VLM-based mobile agents, manipulating symbolic actions and natural language rationales through visual perturbations in training data. This affects agents using LLMs for behaviors in dynamic environments, like UI automation and location-aware reasoning. The text examines mobile-agent-e, a self-evolving assistant, and multi-agent platforms, focusing on formalizing attacks and defenses in LLM-based agents. It explores security challenges, including backdoor and poisoning attacks in AI, LLMs, and NLP models, and examines mobile-agent systems' security matrices and trust scoring systems.

Introduction
Background
Overview of Visual Language Models (VLMs) and their role in mobile agents
Importance of symbolic actions and natural language rationales in dynamic environments
Objective
To analyze the vulnerabilities of mobile agents using LLMs to poisoning attacks
To formalize the security challenges in AI, LLMs, and NLP models
Method
Data Collection
Gathering datasets used for training mobile agents
Identifying sources of visual perturbations in training data
Data Preprocessing
Techniques for identifying and mitigating malicious data
Methods for enhancing the robustness of training data against poisoning attacks
Case Study: Mobile-Agent-e
Self-evolving Assistant
Characteristics and capabilities of mobile-agent-e
Analysis of its vulnerabilities to poisoning attacks
Multi-agent Platforms
Overview of multi-agent systems
Examination of security matrices and trust scoring systems in these platforms
Formalizing Attacks and Defenses
Attack Formalization
Classification of poisoning attacks on VLM-based mobile agents
Detailed description of backdoor and poisoning attacks in AI, LLMs, and NLP models
Defense Mechanisms
Strategies for detecting and mitigating poisoning attacks
Implementation of robustness checks and data validation techniques
Security Challenges
Mobile-agent Systems
Examination of security matrices in mobile-agent systems
Analysis of trust scoring systems and their role in enhancing security
NLP Models
Discussion on the security implications of NLP models in mobile-agent contexts
Strategies for improving the security of NLP models against poisoning attacks
Conclusion
Summary of Findings
Recommendations for Future Research
Implications for Industry and Practice
Advanced features