Reproducing benchmarks #11

sanchit-gandhi · 2024-02-05T18:09:24Z

Hey @shashikg! Thanks for your awesome work on this repo - it's a very cool compilation of the various Whisper implementations 🙌

I'm working on the Hugging Face implementation, and keen to understand better how we can reproduce the numbers from your benchmark. In particular, I'm looking at reproducing the numbers from this table.

The benchmark scripts currently use a local version of the Kincaid dataset:

WhisperS2T/scripts/benchmark_huggingface.py

Line 79 in 8e0b338

data = pd.read_csv(f'{repo_path}/data/KINCAID46/manifest_mp3.tsv', sep="\t")

Would it be possible to share this dataset, in order to re-run the numbers locally? You could push it as a Hugging Face Audio dataset to the Hugging Face Hub, which should be quite straightforward by following this guide: https://huggingface.co/docs/datasets/audio_dataset

Once we can reproduce the runs, we'd love to work with you on tuning the Transformers benchmark to squeeze out extra performance that might be available

Many thanks!

shashikg · 2024-02-06T08:22:11Z

Hey @sanchit-gandhi !

You can prepare the benchmark env using this script: https://github.com/shashikg/WhisperS2T/blob/main/prepare_benchmark_env.sh. This will download the required datasets.

Please also check these numbers for distil-whisper: https://github.com/shashikg/WhisperS2T/releases/tag/v1.1.0

Once we can reproduce the runs, we'd love to work with you on tuning the Transformers benchmark to squeeze out extra performance that might be available

Sure, would love to!

BBC-Esq · 2024-02-21T00:34:48Z

Hey @sanchit-gandhi , despite our prior correspondence regarding the "insanely" faster whisper, I've tested this awesome library and it's accurate. It actually is faster than anything I've ever tested, including "insanely" insane faster whisper. I say this with all humility considering our prior correspondence, but this is apparently what batch processing + ctranslate2 can do. It has its kinks, like timestamps, for example. But if you find different results please feel free to share. Finally, an "apples to apples" comparison as I was lamenting about in our previous correspondence.

As a matter of integrity, please confirm if/when you CONFIRM the results from this repository as well, as I'm assuming that you have an interest in the truth as opposed to "who's is better" kind of metality. Thanks.

BBC-Esq · 2024-02-21T00:41:10Z

@sanchit-gandhi two weeks and not confirmation about the results? ...Hmm...is your interest really in "verifying" the accuracy of the results or promoting Huggingface's ~thousands of stars for the "insanely" insane,. absolutely insane and pure insanity...that is...totally insane! Whisper. C'mon man. Admit the metrics and let's move forward.

I also like how you tried to recruit this fellow to come work with Huggingface. Sheesh. Will the egos never stop...

BBC-Esq mentioned this issue Feb 21, 2024

How about some honest speed/quality tests for a change Vaibhavs10/insanely-fast-whisper#186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing benchmarks #11

Reproducing benchmarks #11

sanchit-gandhi commented Feb 5, 2024

shashikg commented Feb 6, 2024

BBC-Esq commented Feb 21, 2024

BBC-Esq commented Feb 21, 2024 •

edited

Loading

Reproducing benchmarks #11

Reproducing benchmarks #11

Comments

sanchit-gandhi commented Feb 5, 2024

shashikg commented Feb 6, 2024

BBC-Esq commented Feb 21, 2024

BBC-Esq commented Feb 21, 2024 • edited Loading

BBC-Esq commented Feb 21, 2024 •

edited

Loading