Evaluation

ViP-Bench

Extract contents of ViP-Bench to ./playground/data/eval/ViP-Bench.
Single-GPU inference and evaluate for bbox and human drawn visual prompts, respectively.

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh bbox
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh human

Optionally, Change the model name from vip-llava-7b to other LLaVA or ViP-LLaVA models.

Submit the results to the evaluation server: ./playground/data/eval/ViP-Bench/results/vip-llava-7b-human.json.

Optionally, see here, which is an evaluation script using your own openai key.

Source annotation

In source_image, we provide the source plain images along with the bounding box/mask annotations. Researchers can use such grounding information to match the special tokens such as <obj> in "question" entry of vip-bench-meta-data.json. For example, <obj> can be replaced by textual coordinates to evaluate the region-level multimodal models.

Academic Benchmarks

Please download the evaluation json dataset here.

Visusl7W

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/v7w.sh

PointQA-LookTwice

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/pointQA.sh

Visual Commonsense Reasoning

For Q -> A:

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qa.sh

For QA -> R:

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qar.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation.md

Evaluation.md

Evaluation

ViP-Bench

Source annotation

Academic Benchmarks

Visusl7W

PointQA-LookTwice

Visual Commonsense Reasoning

Files

Evaluation.md

Latest commit

History

Evaluation.md

File metadata and controls

Evaluation

ViP-Bench

Source annotation

Academic Benchmarks

Visusl7W

PointQA-LookTwice

Visual Commonsense Reasoning