Hyperparameter Tuning Strategies #2

shrutijpalaskar · 2021-07-26T20:52:01Z

Hi Jaemin,

Thanks for the very interesting paper and releasing your codebase!

I have been working with your codebase for a different multimodal text generation task and observe lower performance with VL-T5 and VL-BART than other similar models. I think this might be a hyperparameter tuning issue. Do you have any advice on which particular parameters might be beneficial to tune? I am currently following the Multi30K settings for the learning rate and number of epochs from Table 14 in your paper.

j-min · 2021-07-26T21:08:15Z

Hi @shrutijpalaskar. Since I had to run all pretraining/finetuning experiments on a 4 x 10GB RTX 2080 ti server (much smaller compared to recent works from big companies), I couldn't try a wide hyperparameter search, which means the current hyperparameters are under-tuned and might be far from optimal. I guess VL-T5/VL-BART model could achieve higher scores on benchmarks with better hyperparameters.
In my experiments, I didn't observe much difference when tuning parameters (ex. batch size, learning rate, epochs) during finetuning. I found improvements when using longer pretraining epochs (10epochs -> 30epochs; I didn't have time to explore longer) and bigger backbone architectures (ex. t5-small -> t5-base), which are kinda obvious.
What is your target multimodal text generation task?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter Tuning Strategies #2

Hyperparameter Tuning Strategies #2

shrutijpalaskar commented Jul 26, 2021

j-min commented Jul 26, 2021

Hyperparameter Tuning Strategies #2

Hyperparameter Tuning Strategies #2

Comments

shrutijpalaskar commented Jul 26, 2021

j-min commented Jul 26, 2021