Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you early release the evaluation scripts with vicuna model. #1

Open
KerolosAtef opened this issue Dec 5, 2023 · 6 comments
Open

Comments

@KerolosAtef
Copy link

No description provided.

@avinash31d
Copy link

+1

@shehanmunasinghe
Copy link
Collaborator

@KerolosAtef @avinash31d , Thank you for your interest in our work. Please find the details about the Vicuna-based quantitative evaluation benchmark here: https://github.com/mbzuai-oryx/Video-LLaVA/tree/main/quantitative_evaluation.

@KerolosAtef
Copy link
Author

thank you very much, but also the Vicuna model doesn't output the same results for each run.

I have tried to reproduce some of the results of video chat GPT and this the results:
ActivityNet : Acc :36.13 instead of 40.8
TGIF: Acc: 63.07 instead of 66.5

@shehanmunasinghe
Copy link
Collaborator

@KerolosAtef We attribute this to the randomness introduced by the temperature parameter in both the tested model and the LLM used for evaluation. This will be addressed in our future work.

@KerolosAtef
Copy link
Author

okay good,
I want to make sure of something, for the Zeroshot datasets (MSVD, MSR-VTT,Activity_net,TGIF) Are you used the testing data or the validation data?

@shehanmunasinghe
Copy link
Collaborator

We follow the same approach as Video-ChatGPT, i.e. using test splits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants