EMNLP 2025

Nov 2025 » LLM Evaluation
EMNLP 2025

Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation

Yerin Hwang*, Dongryeol Lee*, Kyungmin Min, Taegwan Kang, Yongil Kim, Kyomin Jung

Abstract: Recently, large vision-language models (LVLMs) have emerged as the preferred tools for judging text-image alignment, yet their robustness along the visual modality remains underexplored. We investigate whether adversarial visual manipulations can systematically deceive LVLM judges into awarding inflated scores. We introduce FRAME, a fine-grained multi-domain meta-evaluation benchmark designed to reveal score distribution patterns. By introducing specific image-induced biases into the benchmark, we demonstrate that all tested LVLM judges exhibit vulnerability, consistently inflating scores for manipulated images. Our analysis further shows that combining multiple biases amplifies their effects, and pairwise evaluations face similar susceptibility. Notably, visual biases persist under prompt-based mitigation strategies, highlighting the vulnerability of current LVLM evaluation systems and underscoring the urgent need for more robust LVLM judges.

[Paper]