共计 26 篇文章
2025
SmartSight:通过时间注意力坍缩在不损害视频理解的前提下缓解视频大模型中的幻觉问题
Investigating Spatial Attention Bias in Vision-Language Models
AVG-LLaVA: 一种具有自适应视觉粒度的高效大型多模态模型
VISUAL AGENTS AS FAST AND SLOW THINKERS
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
IBD:通过图像有偏解码减轻大型视觉-语言模型中的幻觉
VASparse:通过视觉感知的 token 稀疏化实现高效视觉幻觉缓解
Be My Eyes:通过多智能体协作将大型语言模型扩展到新模态
把MoE整合进LLaVA