NEWSFERENCE
FRI, 08 May 2026 04:00:00
LIVE
$ today --liveF1TodayF2YesterdayF3ArchiveF4About
NEXT SCAN
← BACK TO TODAY/CLUSTER · ARXIV · RESEARCH
CLUSTER · TIER 3
FIRST SEEN 6D AGO
ARXIVRESEARCH

Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

arXiv:2605.05709v1 Announce Type: new Abstract: Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to bypass safety mechanisms. We show that such attacks are governed by a \emph{reconstruction--concealment tradeoff}: the transformed input must hide harmful intent from safety filters while remaining recoverable enough for the victim model to reconstruct the original request. Through a reconstruction analysis of three representative black-box methods, we find that existing transformations struggle to balance this tradeoff, limiting their effectiveness. In contrast, we show that character-removed variants achieve a better balance. Building on this, we propose \emph{concealment-aware variant construction}, which greedily selects character-removed variants that are low in harmful-keyword alignment and mutually diverse, and instantiates them through five modality-aware prompting strategies. We further introduce \emph{keyword-related distractor images} that depict the harmful keyword in diverse contexts, providing more effective auxiliary visual context than generic distractor images. Experiments across closed-source and open-source MLLMs show the proposed strategies outperform strong baselines, revealing an underexplored vulnerability: a model's own reconstruction ability can be exploited to recover hidden harmful intent and produce unsafe responses.

Sources
1
X mentions
First seen
6Dago
Velocity
+2%/6h
CONTRIBUTING SOURCES
1 ARTICLES
  1. arXiv: Artificial Intelligence6D AGO
    arxiv.org/abs/2605.05709
X DISCOURSE
AWAITING X SIGNAL
No notable English-language X chatter on this entity yet.