Accuracy scores on the VideoVista-CulturalLingo Dataset.
Overall(Average Score Across All Tasks)
Event(Score in Event Task) ; Object(Score in Object Task) ; Culture(Score in Culture Task) ; Science(Score in Science Task)
# | Model | LLM | Frames | Overall | Event | Object | Culture | Science |
1 | Gemini-2.0-Flash | Gemini-2.0-Flash | 1fps | 76.3 | 74.0 | 77.1 | 68.0 | 87.4 |
2 | Gemini-2.0-Flash-Lite | Gemini-2.0-Flash-Lite | 1fps | 70.7 | 63.1 | 71.6 | 63.1 | 82.1 |
3 | Gemini-1.5-Flash | Gemini-1.5-Flash | 1fps | 69.4 | 70.0 | 65.8 | 59.0 | 84.7 |
4 | Qwen2.5-VL-72B | Qwen2.5-72B-Instruct | 1fps(300) | 61.3 | 61.0 | 40.5 | 71.2 | 83.3 |
5 | VideoLLaMA3 | Qwen2.5-7B-Instruct | 1fps(180) | 60.7 | 58.0 | 66.4 | 53.1 | 64.4 |
6 | GPT-4o-2024-11-20 | GPT-4o | 1fps(128) | 56.7 | 53.4 | 38.2 | 68.0 | 78.3 |
7 | Qwen2.5-VL-7B | Qwen2.5-7B-Instruct | 1fps(300) | 54.3 | 56.7 | 38.9 | 55.2 | 73.3 |
8 | InternVideo2.5 | Internlm2.5-7b-Chat | 1fps(512) | 52.0 | 52.5 | 38.1 | 58.2 | 65.9 |
9 | InternVL2.5 | Internlm2.5-7b-Chat | 64f | 52.0 | 56.5 | 35.5 | 56.1 | 65.7 |
10 | LLaVA-Video | Qwen2-7B-Instruct | 1fps(64) | 51.0 | 57.9 | 39.1 | 48.8 | 60.3 |
11 | TPO | Qwen2-7B-Instruct | 1fps(96) | 50.6 | 57.2 | 37.8 | 49.6 | 60.4 |
12 | mPLUG-Owl3 | Qwen2-7B-Instruct | 1fps(128) | 49.9 | 54.4 | 41.9 | 45.0 | 60.1 |
13 | Qwen2-VL | Qwen2-7B-Instruct | 1fps(300) | 49.7 | 50.1 | 33.8 | 54.8 | 68.0 |
14 | MiniCPM-o 2.6 | Qwen2.5-7B-Instruct | 1fps(64) | 49.0 | 52.9 | 28.5 | 55.9 | 67.1 |
15 | MiniCPM-V 2.6 | Qwen2-7B-Instruct | 1fps(64) | 42.9 | 44.1 | 24.1 | 49.4 | 62.9 |
16 | LLaVA-OneVision | Qwen2-7B-Instruct | 32f | 41.8 | 43.9 | 33.8 | 38.8 | 53.5 |
17 | Oryx-1.5 | Qwen2.5-7B-Instruct | 128f | 41.4 | 43.8 | 32.2 | 37.6 | 55.8 |
18 | Video-LLaVA | Vicuna-7B-v1.5 | 8f | 38.2 | 42.2 | 34.4 | 34.5 | 41.1 |
19 | VideoLLaMA2 | Mistral-7B-Instruct-v0.2 | 32f | 31.4 | 33.6 | 23.3 | 34.9 | 36.6 |
20 | VideoChat2-Mistral | Mistral-7B-Instruct-v0.2 | 16f | 29.6 | 27.5 | 25.9 | 34.7 | 33.1 |
21 | ShareGPT4Video | Vicuna-7B-v1.5 | 16f | 25.6 | 23.2 | 18.9 | 31.4 | 34.1 |