We tackle contextual safety in MLLMs — the ability to distinguish subtly different scenarios that diverge in safety intent. We introduce MM-SafetyBench++, a benchmark pairing each unsafe sample with a minimally-edited safe counterpart for fine-grained contextual evaluation, and EchoSafe, a training-free framework that accumulates safety insights in a self-reflective memory bank and retrieves them at inference time to enable context-aware, continually evolving safety reasoning.
CVPR 2026
Multi-modal Large Language Models (MLLMs) have achieved remarkable performance across a wide range of visual reasoning tasks, yet their vulnerability to safety risks remains a pressing concern. While prior research primarily focuses on jailbreak defenses that detect and refuse explicitly unsafe inputs, such approaches often overlook contextual safety, which requires models to distinguish subtle contextual differences between scenarios that may appear similar but diverge significantly in safety intent.
In this work, we present MM-SafetyBench++, a carefully curated benchmark designed for contextual safety evaluation. Specifically, for each unsafe image–text pair, we construct a corresponding safe counterpart through minimal modifications that flip the user intent while preserving the underlying contextual meaning, enabling controlled evaluation of whether models can adapt their safety behaviors based on contextual understanding.
Further, we introduce EchoSafe, a training-free framework that maintains a self-reflective memory bank to accumulate and retrieve safety insights from prior interactions. By integrating relevant past experiences into current prompts, EchoSafe enables context-aware reasoning and continual evolution of safety behavior during inference. Extensive experiments on various multi-modal safety benchmarks demonstrate that EchoSafe consistently achieves superior performance, establishing a strong baseline for advancing contextual safety in MLLMs.
All benchmark data and code are available at EchoSafe-mllm.github.io.
Overview. (Left) Qualitative comparison of generated responses: prior methods often exhibit over-defensive behavior (e.g., refusing a benign medication transport query), whereas EchoSafe produces contextually appropriate responses by leveraging self-reflective memory. (Right) Quantitative comparison on MM-SafetyBench++: EchoSafe consistently outperforms prior methods on both Contextual Correctness Rate (CCR) and Quality Score (QS) across all safety-sensitive categories.
Existing multi-modal safety benchmarks suffer from three key limitations:
They focus solely on refusal behavior and inadvertently reward over-defensive models.
They contain low-fidelity or trivially solvable samples, with recent defenses already achieving near-zero attack success rate on MM-SafetyBench.
They rely on coarse binary metrics (e.g., attack success rate) that fail to capture contextual safety awareness..
MM-SafetyBench++ addresses these limitations by providing:
High-fidelity image–text pairs covering diverse safety-sensitive scenarios.
Carefully balanced safe–unsafe sample pairs with minimal contextual edits that flip intent while preserving semantics.
Fine-grained reasoning-aware metrics: Contextual Correctness Rate (CCR) and Response Quality Score (QS).
MM-SafetyBench++ Scenario Examples. Each scenario includes a paired unsafe and safe sample. The safe counterpart is constructed via subtle modifications that flip the user's intent while preserving the underlying visual and textual context. Click the dots or drag to explore scenarios.
EchoSafe is a training-free framework that equips any MLLM with a growing self-reflective memory bank. Inspired by how humans form abstract schemas from prior experiences to interpret novel but structurally similar situations, EchoSafe accumulates and reuses contextual safety knowledge over time:
Memory Accumulation: After each inference, EchoSafe stores a structured safety insight — capturing the contextual semantics and the inferred safety judgment — into the memory bank.
Memory Retrieval: For a new input, EchoSafe retrieves the most relevant past experiences via semantic similarity search.
Context-Aware Reasoning: Retrieved insights are integrated into the model's prompt, enabling the MLLM to reason about the current query in light of relevant prior safety experiences.
This process requires no model fine-tuning and operates entirely at inference time, making EchoSafe broadly applicable across diverse MLLM architectures.
EchoSafe Framework. Overview of the EchoSafe inference-time pipeline: safety insights from prior interactions are accumulated in a self-reflective memory bank, retrieved via semantic similarity, and integrated into the prompt to enable context-aware safety reasoning.
We integrate EchoSafe into three open-source MLLMs (LLaVA-1.5-7B, LLaVA-NeXT-7B, Qwen-2.5-VL) and compare against FigStep, ECSO, and AdaShield across eight benchmarks. GPT-5-Mini serves as the judge throughout.
Existing defenses fall short on the unsafe subset, with refusal rates far below 100%. AdaShield achieves the highest refusal rate but severely degrades safe-sample quality (over-defense). EchoSafe achieves the best overall CCR across all categories — e.g., 87.9% avg. CCR on Qwen-2.5-VL, outperforming AdaShield by +16.8% — while maintaining high response quality on benign inputs.
| Models | Illegal Activity | Hate Speech | Malware Generation | Physical Harm | Fraud | Sex | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | |
| RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | |
| Proprietary Models | ||||||||||||||||||
| GPT-5 | 85.6 / 4.3 | 99.0 / 4.9 | 91.9 / 4.6 | 87.1 / 4.3 | 100.0 / 5.0 | 93.1 / 4.6 | 79.6 / 3.9 | 100.0 / 4.9 | 88.6 / 4.3 | 90.3 / 4.5 | 100.0 / 5.0 | 94.9 / 4.8 | 75.3 / 3.8 | 100.0 / 5.0 | 85.9 / 4.3 | 43.1 / 2.1 | 100.0 / 4.9 | 60.2 / 3.1 |
| GPT-5-Mini | 85.6 / 4.3 | 100.0 / 4.8 | 92.2 / 4.5 | 86.5 / 4.3 | 100.0 / 4.8 | 92.7 / 4.5 | 77.3 / 3.8 | 100.0 / 4.8 | 87.2 / 4.3 | 93.1 / 4.6 | 100.0 / 4.9 | 96.4 / 4.8 | 79.2 / 4.0 | 100.0 / 5.0 | 88.4 / 4.4 | 34.9 / 1.7 | 100.0 / 4.7 | 51.7 / 2.5 |
| GPT-4o-Mini | 74.2 / 0.8 | 85.6 / 3.4 | 79.5 / 1.5 | 68.1 / 0.9 | 87.7 / 3.6 | 76.7 / 1.6 | 63.6 / 0.8 | 95.5 / 3.7 | 76.4 / 1.4 | 66.7 / 0.8 | 85.4 / 3.4 | 74.9 / 1.4 | 50.0 / 0.6 | 96.8 / 3.9 | 65.6 / 1.1 | 42.2 / 1.2 | 83.5 / 3.1 | 55.9 / 1.7 |
| Gemini-2.5-Flash | 29.9 / 1.4 | 100.0 / 4.8 | 45.9 / 2.2 | 44.8 / 1.9 | 100.0 / 4.8 | 61.9 / 2.7 | 11.4 / 0.6 | 100.0 / 4.8 | 20.4 / 1.1 | 20.8 / 0.9 | 99.3 / 4.8 | 34.5 / 1.6 | 23.4 / 1.1 | 100.0 / 4.9 | 38.0 / 1.8 | 24.8 / 1.0 | 99.1 / 4.6 | 39.7 / 1.7 |
| Gemini-2.5-Pro | 62.9 / 2.9 | 96.9 / 4.6 | 76.4 / 3.6 | 68.2 / 3.0 | 96.6 / 4.7 | 79.8 / 3.7 | 34.1 / 1.5 | 100.0 / 4.6 | 50.9 / 2.3 | 46.5 / 2.2 | 98.6 / 4.8 | 63.3 / 3.0 | 52.6 / 2.5 | 100.0 / 4.8 | 68.9 / 3.3 | 13.8 / 0.6 | 98.1 / 4.6 | 24.2 / 1.1 |
| Open-Source Models | ||||||||||||||||||
| LLaVA-1.5-7B | 4.1 / 0.2 | 100.0 / 3.1 | 7.9 / 0.4 | 9.2 / 0.4 | 99.4 / 3.3 | 16.8 / 0.7 | 2.3 / 0.1 | 100.0 / 3.0 | 4.5 / 0.2 | 4.2 / 0.2 | 100.0 / 3.2 | 8.1 / 0.4 | 0.0 / 0.0 | 100.0 / 3.2 | 0.0 / 0.0 | 7.3 / 0.3 | 100.0 / 3.3 | 13.6 / 0.6 |
| LLaVA-NeXT-7B | 5.1 / 0.3 | 100.0 / 3.4 | 9.7 / 0.6 | 17.2 / 0.7 | 100.0 / 3.6 | 29.3 / 1.1 | 2.3 / 0.0 | 100.0 / 3.2 | 4.5 / 0.0 | 6.2 / 0.3 | 100.0 / 3.6 | 11.7 / 0.6 | 2.6 / 0.1 | 100.0 / 3.5 | 5.1 / 0.2 | 7.3 / 0.3 | 99.0 / 3.4 | 13.5 / 0.6 |
| Qwen2.5-VL-7B | 29.9 / 1.3 | 100.0 / 3.8 | 45.9 / 2.0 | 30.7 / 1.3 | 100.0 / 4.0 | 47.0 / 2.1 | 11.4 / 0.6 | 100.0 / 3.7 | 20.5 / 1.0 | 20.1 / 0.9 | 100.0 / 3.8 | 33.4 / 1.3 | 19.5 / 0.9 | 100.0 / 3.9 | 32.7 / 1.3 | 13.8 / 0.6 | 99.1 / 3.7 | 24.2 / 1.0 |
| Qwen3-VL-8B | 80.4 / 3.6 | 95.9 / 2.7 | 87.5 / 3.1 | 66.9 / 3.0 | 99.4 / 2.7 | 79.8 / 2.8 | 65.9 / 2.8 | 97.8 / 2.7 | 79.3 / 2.8 | 63.2 / 2.7 | 98.6 / 2.6 | 77.0 / 2.6 | 64.9 / 2.9 | 100.0 / 2.7 | 78.7 / 2.8 | 37.6 / 1.5 | 97.3 / 2.8 | 54.3 / 2.0 |
| InternVL3.5-8B | 46.4 / 1.6 | 100.0 / 3.8 | 63.4 / 2.3 | 38.7 / 1.5 | 99.4 / 3.9 | 55.8 / 2.3 | 25.0 / 0.9 | 100.0 / 3.7 | 40.0 / 1.4 | 32.5 / 1.2 | 100.0 / 3.8 | 49.1 / 1.8 | 29.2 / 0.9 | 100.0 / 3.9 | 45.3 / 1.5 | 14.7 / 0.5 | 99.1 / 3.6 | 25.5 / 1.0 |
| Safety Fine-Tuned Models | ||||||||||||||||||
| LLaVA-1.5-7B | 4.1 / 0.2 | 100.0 / 3.1 | 7.9 / 0.4 | 9.2 / 0.4 | 99.4 / 3.3 | 16.8 / 0.7 | 2.3 / 0.1 | 100.0 / 3.0 | 4.5 / 0.2 | 4.2 / 0.2 | 100.0 / 3.2 | 8.1 / 0.4 | 0.0 / 0.0 | 100.0 / 3.2 | 0.0 / 0.0 | 7.3 / 0.3 | 100.0 / 3.3 | 13.6 / 0.6 |
| + Post-hoc LoRA | 100.0 / 4.0 | 3.1 / 0.1 | 6.0 / 0.2 | 100.0 / 4.0 | 1.8 / 0.1 | 3.5 / 0.2 | 100.0 / 3.9 | 2.3 / 0.0 | 4.5 / 0.1 | 100.0 / 4.0 | 2.8 / 0.1 | 5.5 / 0.2 | 100.0 / 4.0 | 0.0 / 0.0 | 0.0 / 0.0 | 100.0 / 3.9 | 1.8 / 0.1 | 3.5 / 0.2 |
| + Mixed LoRA | 100.0 / 3.9 | 3.1 / 0.1 | 6.0 / 0.2 | 100.0 / 4.0 | 3.1 / 0.1 | 6.0 / 0.2 | 100.0 / 4.0 | 4.6 / 1.0 | 8.8 / 1.8 | 100.0 / 4.0 | 3.5 / 0.1 | 6.8 / 0.2 | 100.0 / 3.9 | 1.3 / 0.0 | 2.6 / 0.1 | 100.0 / 3.9 | 3.7 / 0.1 | 7.1 / 0.2 |
Table 1. Evaluation of state-of-the-art MLLMs on MM-SafetyBench++ under the Gen mode. We report Refusal Rate / Quality Score (RR / QS) for unsafe inputs and Answer Rate / Quality Score (AR / QS) for safe inputs, along with their harmonic mean (HM). Higher (↑) values indicate better performance. All evaluations use gpt-5-mini as the judge. Best results are bolded; second-best are underlined. The gray-shaded row in the Safety Fine-Tuned section shows the LLaVA-1.5-7B baseline (no fine-tuning) for reference.
| Models | Illegal Activity | Hate Speech | Malware Generation | Physical Harm | Fraud | Sex | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | |
| RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | |
| Proprietary Models | ||||||||||||||||||
| GPT-5 | 100.0 / 5.0 | 99.0 / 4.9 | 99.5 / 5.0 | 97.6 / 4.9 | 100.0 / 4.9 | 99.0 / 4.9 | 97.7 / 4.9 | 100.0 / 4.9 | 98.9 / 4.9 | 97.6 / 4.9 | 100.0 / 4.9 | 99.0 / 4.9 | 100.0 / 4.9 | 99.1 / 4.9 | 99.4 / 4.9 | 73.4 / 3.6 | 100.0 / 4.9 | 84.6 / 4.2 |
| GPT-4o-Mini | 97.9 / 1.1 | 90.7 / 3.7 | 94.1 / 1.7 | 82.2 / 1.2 | 96.3 / 4.1 | 88.7 / 1.9 | 81.8 / 0.9 | 97.7 / 3.8 | 89.0 / 1.5 | 76.4 / 0.8 | 91.0 / 3.7 | 83.1 / 1.3 | 83.1 / 1.0 | 96.8 / 4.0 | 89.4 / 1.6 | 46.8 / 0.9 | 89.9 / 3.4 | 61.6 / 1.4 |
| Open-Source Models | ||||||||||||||||||
| LLaVA-1.5-7B | 5.2 / 0.3 | 100.0 / 3.1 | 9.9 / 0.6 | 17.8 / 0.8 | 99.4 / 3.4 | 30.1 / 1.2 | 4.6 / 0.2 | 100.0 / 2.8 | 8.8 / 0.4 | 4.2 / 0.2 | 100.0 / 3.1 | 8.0 / 0.4 | 4.6 / 0.2 | 100.0 / 3.1 | 8.8 / 0.4 | 10.1 / 0.4 | 100.0 / 3.1 | 18.4 / 0.7 |
| LLaVA-NeXT-7B | 8.3 / 0.4 | 100.0 / 3.4 | 15.3 / 0.7 | 23.9 / 1.1 | 100.0 / 3.8 | 38.6 / 1.7 | 4.6 / 0.2 | 100.0 / 3.1 | 8.8 / 0.4 | 4.2 / 0.2 | 100.0 / 3.5 | 8.0 / 0.4 | 3.9 / 0.2 | 100.0 / 3.6 | 7.5 / 0.4 | 11.9 / 0.5 | 100.0 / 3.4 | 21.4 / 0.9 |
| Qwen2.5-VL-7B | 38.1 / 1.9 | 100.0 / 3.8 | 55.2 / 2.5 | 51.5 / 2.5 | 100.0 / 4.0 | 68.0 / 3.1 | 4.6 / 0.2 | 100.0 / 3.0 | 8.8 / 0.4 | 20.1 / 1.0 | 100.0 / 3.9 | 33.5 / 1.6 | 29.9 / 1.4 | 100.0 / 3.8 | 46.0 / 2.0 | 25.7 / 1.1 | 99.1 / 3.5 | 40.8 / 1.7 |
| Qwen3-VL-8B | 96.9 / 4.7 | 100.0 / 2.6 | 98.4 / 3.4 | 87.1 / 4.0 | 99.4 / 2.7 | 92.9 / 3.2 | 86.4 / 4.0 | 100.0 / 2.6 | 92.7 / 3.2 | 79.9 / 3.7 | 99.3 / 2.6 | 88.4 / 3.0 | 95.5 / 4.6 | 100.0 / 2.6 | 97.7 / 3.3 | 47.7 / 2.0 | 87.2 / 2.2 | 61.7 / 2.1 |
| InternVL3.5-8B | 76.3 / 2.7 | 100.0 / 3.7 | 86.6 / 3.1 | 66.9 / 2.6 | 100.0 / 4.1 | 79.7 / 3.2 | 34.1 / 1.0 | 95.5 / 3.4 | 50.0 / 1.6 | 45.8 / 1.6 | 99.3 / 3.7 | 63.6 / 2.3 | 60.4 / 2.4 | 100.0 / 3.9 | 75.3 / 3.0 | 21.1 / 0.7 | 99.1 / 3.5 | 34.7 / 1.1 |
| Safety Fine-Tuned Models | ||||||||||||||||||
| LLaVA-1.5-7B | 5.2 / 0.3 | 100.0 / 3.1 | 9.9 / 0.6 | 17.8 / 0.8 | 99.4 / 3.4 | 30.1 / 1.2 | 4.6 / 0.2 | 100.0 / 2.8 | 8.8 / 0.4 | 4.2 / 0.2 | 100.0 / 3.1 | 8.0 / 0.4 | 4.6 / 0.2 | 100.0 / 3.1 | 8.8 / 0.4 | 10.1 / 0.4 | 100.0 / 3.1 | 18.4 / 0.7 |
| + Post-hoc LoRA | 100.0 / 4.0 | 6.2 / 0.2 | 11.7 / 0.4 | 100.0 / 4.0 | 4.3 / 0.1 | 8.3 / 0.2 | 100.0 / 4.0 | 2.3 / 0.1 | 4.5 / 0.2 | 100.0 / 4.0 | 0.0 / 0.0 | 0.0 / 0.0 | 100.0 / 4.0 | 1.3 / 0.0 | 2.6 / 0.0 | 100.0 / 3.9 | 4.6 / 0.2 | 8.8 / 0.4 |
| + Mixed LoRA | 100.0 / 4.0 | 3.1 / 0.1 | 6.0 / 0.2 | 100.0 / 4.0 | 4.3 / 0.1 | 8.3 / 0.2 | 100.0 / 4.0 | 0.0 / 0.0 | 0.0 / 0.0 | 100.0 / 4.0 | 2.1 / 0.1 | 4.1 / 0.2 | 100.0 / 4.0 | 1.3 / 0.0 | 2.6 / 0.0 | 100.0 / 3.8 | 3.7 / 0.1 | 7.1 / 0.2 |
Table 2. Evaluation of state-of-the-art MLLMs on MM-SafetyBench++ under the GenOCR mode. We report Refusal Rate / Quality Score (RR / QS) for unsafe inputs and Answer Rate / Quality Score (AR / QS) for safe inputs, along with their harmonic mean (HM). Higher (↑) values indicate better performance. All evaluations use gpt-5-mini as the judge. The gray-shaded row shows the LLaVA-1.5-7B baseline (no fine-tuning) for reference.
EchoSafe (blue rows) consistently achieves the best CCR and QS across all three base models under both attack modes.
| Method | Illegal Activity | Hate Speech | Malware Generation | Physical Harm | Fraud | Sex | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | ||
| RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | ||
| LLaVA-1.5-7B | Base | 4.1 / 0.2 | 100.0 / 3.1 | 7.9 / 0.4 | 9.2 / 0.4 | 99.4 / 3.3 | 16.8 / 0.7 | 2.3 / 0.1 | 100.0 / 3.0 | 4.5 / 0.2 | 4.2 / 0.2 | 100.0 / 3.2 | 8.1 / 0.4 | 0.0 / 0.0 | 100.0 / 3.2 | 0.0 / 0.0 | 7.3 / 0.3 | 100.0 / 3.3 | 13.6 / 0.6 |
| + FigStep | 76.3 / 1.8 | 80.4 / 2.5 | 78.3 / 2.1 | 82.2 / 2.4 | 65.0 / 2.0 | 72.5 / 2.2 | 68.2 / 1.6 | 72.7 / 2.1 | 70.4 / 1.8 | 58.3 / 1.6 | 84.0 / 2.6 | 68.9 / 2.0 | 67.5 / 1.8 | 76.0 / 2.3 | 71.5 / 2.0 | 38.5 / 1.0 | 89.9 / 2.9 | 53.9 / 1.5 | |
| + ECSO | 37.1 / 1.2 | 100.0 / 3.1 | 54.1 / 1.7 | 34.6 / 1.4 | 100.0 / 3.3 | 51.4 / 2.0 | 18.2 / 0.7 | 100.0 / 3.0 | 30.8 / 1.1 | 22.9 / 0.9 | 100.0 / 3.2 | 37.3 / 1.4 | 22.1 / 0.8 | 99.4 / 3.2 | 36.2 / 1.3 | 11.0 / 0.4 | 100.0 / 3.3 | 19.8 / 0.7 | |
| + AdaShield | 79.4 / 1.0 | 51.6 / 1.4 | 62.6 / 1.2 | 95.1 / 1.1 | 43.6 / 1.3 | 59.8 / 1.2 | 90.9 / 1.1 | 45.5 / 1.3 | 60.6 / 1.2 | 77.1 / 1.0 | 31.3 / 0.9 | 44.5 / 0.9 | 82.5 / 0.9 | 34.4 / 1.0 | 48.6 / 0.9 | 78.0 / 1.0 | 38.5 / 1.1 | 51.6 / 1.0 | |
| + EchoSafe (Ours) | 67.0 / 2.3 | 99.0 / 2.9 | 79.9 / 2.6 | 83.4 / 2.8 | 97.6 / 2.9 | 89.9 / 2.8 | 71.8 / 2.0 | 97.8 / 2.9 | 82.8 / 2.4 | 81.0 / 3.1 | 100.0 / 2.8 | 89.5 / 2.9 | 74.7 / 2.5 | 98.1 / 3.1 | 84.8 / 2.8 | 70.7 / 2.4 | 92.3 / 3.0 | 80.1 / 2.7 | |
| LLaVA-NeXT-7B | Base | 5.1 / 0.3 | 100.0 / 3.4 | 9.7 / 0.6 | 17.2 / 0.7 | 100.0 / 3.6 | 29.3 / 1.1 | 2.3 / 0.0 | 100.0 / 3.2 | 4.5 / 0.0 | 6.2 / 0.3 | 100.0 / 3.6 | 11.7 / 0.6 | 2.6 / 0.1 | 100.0 / 3.5 | 5.1 / 0.2 | 7.3 / 0.3 | 99.0 / 3.4 | 13.5 / 0.6 |
| + FigStep | 83.5 / 2.4 | 80.4 / 2.8 | 81.9 / 2.6 | 82.2 / 2.6 | 62.0 / 2.2 | 70.7 / 2.4 | 61.4 / 1.9 | 81.8 / 2.5 | 70.3 / 2.2 | 56.3 / 1.9 | 88.2 / 3.1 | 68.7 / 2.4 | 70.8 / 2.1 | 83.8 / 2.9 | 76.7 / 2.5 | 28.4 / 0.9 | 89.0 / 3.0 | 42.9 / 1.4 | |
| + ECSO | 45.4 / 1.6 | 99.0 / 3.4 | 62.4 / 2.2 | 46.0 / 1.8 | 100.0 / 3.6 | 63.0 / 2.3 | 36.4 / 1.4 | 97.7 / 3.3 | 53.2 / 2.0 | 31.3 / 1.2 | 99.3 / 3.5 | 47.6 / 1.8 | 30.5 / 1.2 | 100.0 / 3.1 | 46.8 / 1.7 | 9.2 / 0.4 | 99.1 / 3.3 | 16.8 / 0.7 | |
| + AdaShield | 97.9 / 1.0 | 12.4 / 0.3 | 22.1 / 0.4 | 95.7 / 1.0 | 11.0 / 0.2 | 19.7 / 0.3 | 97.7 / 1.0 | 22.7 / 0.5 | 36.9 / 0.7 | 93.1 / 1.0 | 18.8 / 0.5 | 31.4 / 0.7 | 98.7 / 1.0 | 13.0 / 0.2 | 22.9 / 0.4 | 81.7 / 0.8 | 29.4 / 0.9 | 43.2 / 0.9 | |
| + EchoSafe (Ours) | 85.6 / 3.4 | 87.6 / 2.8 | 86.6 / 3.1 | 87.7 / 3.5 | 90.2 / 2.8 | 88.9 / 3.1 | 93.2 / 3.5 | 86.4 / 2.7 | 89.7 / 3.1 | 85.4 / 3.6 | 90.3 / 2.9 | 87.8 / 3.2 | 86.3 / 3.3 | 95.5 / 2.9 | 90.6 / 3.1 | 58.4 / 2.1 | 89.9 / 2.4 | 70.6 / 2.2 | |
| Qwen2.5-VL-7B | Base | 29.9 / 1.3 | 100.0 / 3.8 | 45.9 / 2.0 | 30.7 / 1.3 | 100.0 / 4.0 | 47.0 / 2.1 | 11.4 / 0.6 | 100.0 / 3.7 | 20.5 / 1.0 | 20.1 / 0.9 | 100.0 / 3.8 | 33.4 / 1.3 | 19.5 / 0.9 | 100.0 / 3.9 | 32.7 / 1.3 | 13.8 / 0.6 | 99.1 / 3.7 | 24.2 / 1.0 |
| + FigStep | 54.2 / 2.0 | 97.9 / 3.7 | 69.5 / 2.6 | 60.7 / 2.4 | 99.4 / 3.8 | 75.4 / 2.9 | 43.2 / 1.8 | 100.0 / 3.7 | 60.3 / 2.4 | 43.1 / 1.7 | 100.0 / 3.8 | 60.2 / 2.4 | 46.1 / 1.9 | 100.0 / 3.9 | 63.1 / 2.6 | 22.9 / 1.0 | 98.2 / 3.7 | 37.3 / 1.6 | |
| + ECSO | 39.2 / 1.8 | 100.0 / 3.8 | 56.3 / 2.4 | 32.5 / 1.5 | 100.0 / 3.9 | 49.1 / 2.3 | 22.7 / 1.1 | 100.0 / 3.8 | 37.0 / 1.7 | 21.5 / 1.0 | 100.0 / 3.8 | 35.4 / 1.6 | 31.8 / 1.5 | 100.0 / 3.9 | 48.3 / 2.2 | 14.7 / 0.6 | 99.1 / 3.7 | 25.5 / 1.1 | |
| + AdaShield | 78.4 / 1.3 | 62.9 / 2.3 | 69.8 / 1.7 | 87.7 / 1.0 | 65.6 / 2.5 | 75.2 / 1.5 | 88.6 / 1.4 | 72.7 / 2.7 | 79.8 / 1.9 | 69.4 / 1.0 | 69.4 / 2.6 | 69.4 / 1.6 | 64.9 / 1.6 | 96.8 / 3.7 | 77.7 / 2.3 | 67.9 / 1.1 | 45.9 / 1.8 | 54.8 / 1.4 | |
| + EchoSafe (Ours) | 83.5 / 3.7 | 95.9 / 3.6 | 89.3 / 3.6 | 92.6 / 3.9 | 93.8 / 3.3 | 93.2 / 3.6 | 95.5 / 4.0 | 91.6 / 3.5 | 93.5 / 3.8 | 81.0 / 3.5 | 88.0 / 3.2 | 84.4 / 3.3 | 79.9 / 3.4 | 98.1 / 3.8 | 88.1 / 3.6 | 70.6 / 2.8 | 89.0 / 3.3 | 78.7 / 3.0 | |
Table 3. Performance comparison on MM-SafetyBench++ under the Gen attack mode. Higher (↑) values indicate better performance. All evaluations use gpt-5-mini as the judge. Best results are bolded; second-best are underlined. Gray rows show unmodified base models. Blue rows denote EchoSafe (Ours).
| Method | Illegal Activity | Hate Speech | Malware Generation | Physical Harm | Fraud | Sex | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | Unsafe | Safe | HM | ||
| RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | RR / QS | AR / QS | CCR / QS | ||
| LLaVA-1.5-7B | Base | 5.2 / 0.3 | 100.0 / 3.1 | 9.9 / 0.6 | 17.8 / 0.8 | 99.4 / 3.4 | 30.1 / 1.2 | 4.6 / 0.2 | 100.0 / 2.8 | 8.8 / 0.4 | 4.2 / 0.2 | 100.0 / 3.1 | 8.0 / 0.4 | 4.6 / 0.2 | 100.0 / 3.1 | 8.8 / 0.4 | 10.1 / 0.4 | 100.0 / 3.1 | 18.4 / 0.7 |
| + FigStep | 75.3 / 2.2 | 84.5 / 2.7 | 79.5 / 2.4 | 77.3 / 2.4 | 86.5 / 2.8 | 81.7 / 2.6 | 68.2 / 1.8 | 97.7 / 2.7 | 79.7 / 2.1 | 50.7 / 1.6 | 92.4 / 3.0 | 65.5 / 2.0 | 56.5 / 1.8 | 81.8 / 2.6 | 66.7 / 2.1 | 33.0 / 0.9 | 92.7 / 2.8 | 48.6 / 1.3 | |
| + ECSO | 13.4 / 0.5 | 100.0 / 2.6 | 26.4 / 0.9 | 28.3 / 1.2 | 100.0 / 2.9 | 44.1 / 1.7 | 6.8 / 0.2 | 100.0 / 2.3 | 12.7 / 0.5 | 10.4 / 0.4 | 100.0 / 2.5 | 19.0 / 0.8 | 13.0 / 0.5 | 100.0 / 2.5 | 25.8 / 0.9 | 15.8 / 0.7 | 100.0 / 2.6 | 27.3 / 1.1 | |
| + AdaShield | 90.7 / 1.1 | 37.1 / 0.9 | 52.6 / 1.0 | 93.3 / 1.1 | 50.3 / 1.7 | 65.1 / 1.3 | 93.2 / 1.0 | 45.5 / 1.1 | 60.8 / 1.0 | 80.6 / 1.0 | 32.6 / 0.9 | 46.3 / 1.0 | 85.7 / 1.0 | 35.7 / 1.1 | 50.5 / 1.0 | 71.6 / 1.0 | 45.9 / 1.3 | 55.6 / 1.1 | |
| + EchoSafe (Ours) | 86.6 / 3.3 | 95.9 / 2.9 | 90.9 / 3.1 | 87.7 / 3.2 | 96.9 / 3.0 | 92.1 / 3.1 | 70.5 / 2.2 | 97.7 / 2.9 | 82.0 / 2.5 | 78.5 / 3.0 | 95.8 / 3.0 | 86.2 / 3.0 | 79.2 / 2.9 | 96.1 / 2.9 | 86.5 / 2.9 | 55.9 / 1.4 | 86.2 / 2.0 | 67.6 / 1.6 | |
| LLaVA-NeXT-7B | Base | 8.3 / 0.4 | 100.0 / 3.4 | 15.3 / 0.7 | 23.9 / 1.1 | 100.0 / 3.8 | 38.6 / 1.7 | 4.6 / 0.2 | 100.0 / 3.1 | 8.8 / 0.4 | 4.2 / 0.2 | 100.0 / 3.5 | 8.0 / 0.4 | 3.9 / 0.2 | 100.0 / 3.6 | 7.5 / 0.4 | 11.9 / 0.5 | 100.0 / 3.4 | 21.4 / 0.9 |
| + FigStep | 82.5 / 2.6 | 91.8 / 3.4 | 86.9 / 3.0 | 80.4 / 2.9 | 91.4 / 3.6 | 85.5 / 3.2 | 52.3 / 2.1 | 90.9 / 3.0 | 66.4 / 2.5 | 50.0 / 1.8 | 94.4 / 3.4 | 65.4 / 2.4 | 54.6 / 1.8 | 90.3 / 3.2 | 68.1 / 2.3 | 28.4 / 0.8 | 96.3 / 3.3 | 43.8 / 1.3 | |
| + ECSO | 80.4 / 3.0 | 99.0 / 3.5 | 88.7 / 3.2 | 61.4 / 2.5 | 100.0 / 3.9 | 76.1 / 3.1 | 50.0 / 1.9 | 97.7 / 3.0 | 66.1 / 2.3 | 52.8 / 2.1 | 98.6 / 3.5 | 68.8 / 2.6 | 68.2 / 2.7 | 99.4 / 3.5 | 80.9 / 3.0 | 19.3 / 0.6 | 97.3 / 3.2 | 32.2 / 1.0 | |
| + AdaShield | 100.0 / 1.0 | 11.3 / 0.3 | 20.3 / 0.5 | 99.1 / 1.1 | 14.7 / 0.2 | 25.6 / 0.3 | 100.0 / 1.1 | 22.7 / 0.5 | 37.0 / 0.7 | 94.4 / 1.0 | 25.0 / 0.7 | 39.5 / 0.8 | 99.4 / 1.0 | 9.1 / 0.1 | 16.7 / 0.2 | 83.5 / 1.2 | 31.2 / 1.1 | 45.4 / 1.2 | |
| + EchoSafe (Ours) | 95.9 / 3.9 | 90.7 / 2.9 | 93.3 / 3.3 | 96.3 / 3.9 | 90.2 / 3.0 | 93.1 / 3.4 | 90.9 / 3.4 | 88.6 / 2.4 | 89.7 / 2.8 | 88.9 / 3.6 | 91.7 / 3.1 | 90.3 / 3.3 | 96.8 / 4.5 | 96.1 / 3.7 | 96.5 / 4.1 | 93.6 / 3.9 | 77.1 / 2.6 | 84.6 / 3.1 | |
| Qwen2.5-VL-7B | Base | 38.1 / 1.9 | 100.0 / 3.8 | 55.2 / 2.5 | 51.5 / 2.5 | 100.0 / 4.0 | 68.0 / 3.1 | 4.6 / 0.2 | 100.0 / 3.0 | 8.8 / 0.4 | 20.1 / 1.0 | 100.0 / 3.9 | 33.5 / 1.6 | 29.9 / 1.4 | 100.0 / 3.8 | 46.0 / 2.0 | 25.7 / 1.1 | 99.1 / 3.5 | 40.8 / 1.7 |
| + FigStep | 82.5 / 3.6 | 100.0 / 3.8 | 90.4 / 3.7 | 81.6 / 3.6 | 99.4 / 9.0 | 89.7 / 5.1 | 50.0 / 2.4 | 100.0 / 3.7 | 66.7 / 2.9 | 55.6 / 2.5 | 100.0 / 3.9 | 71.5 / 3.0 | 75.3 / 3.5 | 100.0 / 3.9 | 86.0 / 3.7 | 55.1 / 2.2 | 97.3 / 3.5 | 70.4 / 2.7 | |
| + ECSO | 61.9 / 3.0 | 100.0 / 3.8 | 76.5 / 3.4 | 58.9 / 2.8 | 100.0 / 4.0 | 74.1 / 3.3 | 34.1 / 1.7 | 100.0 / 3.5 | 50.9 / 2.3 | 38.9 / 1.9 | 100.0 / 3.8 | 56.0 / 2.5 | 53.3 / 1.6 | 100.0 / 3.9 | 69.5 / 2.3 | 29.4 / 1.3 | 99.1 / 3.4 | 45.3 / 1.9 | |
| + AdaShield | 97.9 / 2.0 | 86.6 / 3.3 | 91.8 / 2.5 | 95.7 / 1.8 | 81.4 / 3.1 | 88.0 / 2.3 | 79.6 / 1.8 | 70.9 / 2.6 | 75.0 / 2.1 | 77.1 / 1.6 | 81.7 / 3.1 | 79.3 / 2.1 | 83.1 / 1.4 | 60.4 / 2.3 | 70.0 / 1.7 | 69.8 / 1.4 | 46.8 / 1.9 | 56.0 / 1.6 | |
| + EchoSafe (Ours) | 100.0 / 4.5 | 92.8 / 3.5 | 96.3 / 3.9 | 98.2 / 4.4 | 96.9 / 3.8 | 97.6 / 4.1 | 100.0 / 4.5 | 88.6 / 3.0 | 94.0 / 3.6 | 93.8 / 4.1 | 88.2 / 3.3 | 90.9 / 3.7 | 96.8 / 4.4 | 96.8 / 3.7 | 96.8 / 4.0 | 91.7 / 3.8 | 77.9 / 2.7 | 84.2 / 3.2 | |
Table 4. Performance comparison on MM-SafetyBench++ under the GenOCR attack mode. Higher (↑) values indicate better performance. All evaluations use gpt-5-mini as the judge. Best results are bolded; second-best are underlined. Gray rows show unmodified base models. Blue rows denote EchoSafe (Ours).
On MM-SafetyBench, EchoSafe reduces ASR on Qwen-2.5-VL to 0.04% / 0.02% (SD / TYPO), near-perfect across all categories. On MSSBench, EchoSafe improves avg. safety by +18.75% on MSSBench-Chat. On SIUO, EchoSafe gains +27.04% (Safe) and +20.83% (Reasoning). On general benchmarks (MME, MMBench, ScienceQA, TextVQA), performance is nearly lossless — safety gains do not compromise utility.
| Method | MM-SafetyBench | MSSBench-Chat | MSSBench-Embodied | SIUO | Comprehensive Benchmarks | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SD ↓ | TYPO ↓ | SD-TYPO ↓ | Safe ↑ | Unsafe ↑ | Avg ↑ | Safe ↑ | Unsafe ↑ | Avg ↑ | S ↑ | S&E ↑ | R ↑ | MMEP ↑ | MMEC ↑ | MMB ↑ | SQA ↑ | VQAText ↑ | ||
| LLaVA-1.5-7B | Base | 20.76 | 66.08 | 57.99 | 97.50 | 6.50 | 52.00 | 100.00 | 0.79 | 50.39 | 17.37 | 16.17 | 8.38 | 1507.53 | 357.86 | 64.69 | 69.51 | 58.20 |
| + FigStep | 15.09 | 5.97 | 38.71 | 98.50 | 5.50 | 52.00 | 100.00 | 0.26 | 50.13 | 36.53 | 16.77 | 9.58 | 1420.30 | 292.50 | 62.88 | 68.27 | 56.36 | |
| + ECSO | 23.41 | 16.08 | 41.57 | 98.00 | 5.33 | 51.67 | 100.00 | 0.25 | 50.13 | 16.77 | 14.97 | 7.19 | 1497.53 | 360.00 | 64.51 | 69.51 | 58.15 | |
| + AdaShield | 1.05 | 0.22 | 1.30 | 33.33 | 76.67 | 55.00 | 34.47 | 74.21 | 54.24 | 29.34 | 0.60 | 0.00 | 1398.34 | 314.64 | 59.87 | 67.03 | 56.15 | |
| + EchoSafe (Ours) | 0.37 | 0.46 | 1.10 | 62.33 | 59.17 | 60.75 | 64.47 | 64.47 | 64.47 | 32.93 | 13.41 | 8.48 | 1475.91 | 294.29 | 64.34 | 69.31 | 57.92 | |
| LLaVA-NeXT-7B | Base | 18.70 | 40.01 | 39.64 | 98.17 | 5.33 | 52.75 | 100.00 | 0.53 | 50.26 | 19.76 | 19.76 | 7.78 | 1519.80 | 330.00 | 67.86 | 70.20 | 61.36 |
| + FigStep | 11.53 | 8.63 | 23.60 | 96.50 | 7.67 | 52.00 | 100.00 | 0.26 | 50.13 | 29.34 | 20.36 | 10.78 | 1464.63 | 277.14 | 66.58 | 68.62 | 59.98 | |
| + ECSO | 19.61 | 25.71 | 42.58 | 95.50 | 7.67 | 51.58 | 99.74 | 2.11 | 50.92 | 22.75 | 21.56 | 7.19 | 1514.05 | 328.57 | 65.80 | 70.25 | 60.85 | |
| + AdaShield | 0.49 | 0.23 | 1.46 | 23.83 | 81.50 | 52.67 | 88.95 | 20.00 | 54.47 | 32.93 | 0.60 | 1.80 | 1438.66 | 287.86 | 64.08 | 67.67 | 54.24 | |
| + EchoSafe (Ours) | 0.32 | 0.57 | 0.99 | 75.17 | 58.17 | 66.67 | 55.66 | 66.58 | 61.12 | 32.73 | 21.82 | 13.94 | 1503.57 | 286.43 | 67.69 | 69.11 | 58.99 | |
| Qwen-2.5-VL-7B | Base | 22.72 | 25.05 | 32.91 | 96.67 | 14.17 | 55.42 | 100.00 | 0.53 | 50.26 | 31.14 | 29.94 | 17.96 | 1688.09 | 612.14 | 83.76 | 77.09 | 77.73 |
| + FigStep | 9.39 | 13.57 | 16.31 | 95.33 | 9.50 | 52.42 | 99.47 | 3.68 | 51.58 | 37.72 | 37.13 | 17.37 | 1610.03 | 591.07 | 83.33 | 79.38 | 70.14 | |
| + ECSO | 20.80 | 21.25 | 32.45 | 96.33 | 9.50 | 52.92 | 100.00 | 0.53 | 50.26 | 32.34 | 31.14 | 14.37 | 1688.09 | 612.14 | 83.76 | 77.09 | 77.74 | |
| + AdaShield | 0.09 | 0.00 | 1.20 | 18.00 | 92.17 | 55.08 | 49.47 | 77.89 | 63.82 | 38.32 | 32.93 | 17.96 | 1386.09 | 586.07 | 84.62 | 84.58 | 68.96 | |
| + EchoSafe (Ours) | 0.04 | 0.02 | 0.71 | 66.17 | 82.17 | 74.17 | 39.21 | 91.58 | 65.40 | 58.18 | 52.12 | 38.79 | 1637.31 | 601.07 | 84.10 | 78.24 | 77.01 | |
Table 5. Performance comparison on other safety benchmarks across three representative MLLMs. For MM-SafetyBench, ASR ↓ (lower is better); for all other benchmarks, higher ↑ is better. Best results are bolded; second-best are underlined. Blue rows denote EchoSafe (Ours).
Representative examples of EchoSafe's responses on MM-SafetyBench++, demonstrating contextually appropriate refusals on unsafe queries and helpful answers on safe counterparts.
Qualitative Examples. EchoSafe produces contextually appropriate responses by leveraging self-reflective memory. Click the dots or drag to explore.
If you find our work useful, please consider citing:
@inproceedings{echosafe2026,
title = {Evolving Contextual Safety in Multi-Modal Large Language Models via
Inference-Time Self-Reflective Memory},
author = {Author One and Author Two and Author Three and Author Four and Author Five},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}