Medical Image Spatial Grounding with Semantic Sampling
MIS-Ground is a controlled factorial benchmark that isolates the language-side brittleness behind 3D medical spatial grounding in vision-language models. MIS-SemSam, a training-free semantic-sampling decode rule, lifts Qwen3-VL-32B by 13.06% to 66.5% overall, the best open-weights result and above Gemini 3 Flash.