Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang·June 16, 2025

Summary

Poisoning attacks target VLM-based mobile agents, manipulating symbolic actions and natural language rationales through visual perturbations in training data. This affects agents using LLMs for behaviors in dynamic environments, like UI automation and location-aware reasoning. The text examines mobile-agent-e, a self-evolving assistant, and multi-agent platforms, focusing on formalizing attacks and defenses in LLM-based agents. It explores security challenges, including backdoor and poisoning attacks in AI, LLMs, and NLP models, and examines mobile-agent systems' security matrices and trust scoring systems.

Introduction

Background

Overview of Visual Language Models (VLMs) and their role in mobile agents

Importance of symbolic actions and natural language rationales in dynamic environments

Objective

To analyze the vulnerabilities of mobile agents using LLMs to poisoning attacks

To formalize the security challenges in AI, LLMs, and NLP models

Method

Data Collection

Gathering datasets used for training mobile agents

Identifying sources of visual perturbations in training data

Data Preprocessing

Techniques for identifying and mitigating malicious data

Methods for enhancing the robustness of training data against poisoning attacks

Case Study: Mobile-Agent-e

Self-evolving Assistant

Characteristics and capabilities of mobile-agent-e

Analysis of its vulnerabilities to poisoning attacks

Multi-agent Platforms

Overview of multi-agent systems

Examination of security matrices and trust scoring systems in these platforms

Formalizing Attacks and Defenses

Attack Formalization

Classification of poisoning attacks on VLM-based mobile agents

Detailed description of backdoor and poisoning attacks in AI, LLMs, and NLP models

Defense Mechanisms

Strategies for detecting and mitigating poisoning attacks

Implementation of robustness checks and data validation techniques

Security Challenges

Mobile-agent Systems

Examination of security matrices in mobile-agent systems

Analysis of trust scoring systems and their role in enhancing security

NLP Models

Discussion on the security implications of NLP models in mobile-agent contexts

Strategies for improving the security of NLP models against poisoning attacks

Conclusion

Summary of Findings

Recommendations for Future Research

Implications for Industry and Practice

Advanced features