Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing benchmarks #11

Open
sanchit-gandhi opened this issue Feb 5, 2024 · 3 comments
Open

Reproducing benchmarks #11

sanchit-gandhi opened this issue Feb 5, 2024 · 3 comments

Comments

@sanchit-gandhi
Copy link

Hey @shashikg! Thanks for your awesome work on this repo - it's a very cool compilation of the various Whisper implementations 🙌

I'm working on the Hugging Face implementation, and keen to understand better how we can reproduce the numbers from your benchmark. In particular, I'm looking at reproducing the numbers from this table.

The benchmark scripts currently use a local version of the Kincaid dataset:

data = pd.read_csv(f'{repo_path}/data/KINCAID46/manifest_mp3.tsv', sep="\t")

Would it be possible to share this dataset, in order to re-run the numbers locally? You could push it as a Hugging Face Audio dataset to the Hugging Face Hub, which should be quite straightforward by following this guide: https://huggingface.co/docs/datasets/audio_dataset

Once we can reproduce the runs, we'd love to work with you on tuning the Transformers benchmark to squeeze out extra performance that might be available

Many thanks!

@shashikg
Copy link
Owner

shashikg commented Feb 6, 2024

Hey @sanchit-gandhi !

You can prepare the benchmark env using this script: https://github.com/shashikg/WhisperS2T/blob/main/prepare_benchmark_env.sh. This will download the required datasets.

Please also check these numbers for distil-whisper: https://github.com/shashikg/WhisperS2T/releases/tag/v1.1.0

Once we can reproduce the runs, we'd love to work with you on tuning the Transformers benchmark to squeeze out extra performance that might be available

Sure, would love to!

@BBC-Esq
Copy link

BBC-Esq commented Feb 21, 2024

Hey @sanchit-gandhi , despite our prior correspondence regarding the "insanely" faster whisper, I've tested this awesome library and it's accurate. It actually is faster than anything I've ever tested, including "insanely" insane faster whisper. I say this with all humility considering our prior correspondence, but this is apparently what batch processing + ctranslate2 can do. It has its kinks, like timestamps, for example. But if you find different results please feel free to share. Finally, an "apples to apples" comparison as I was lamenting about in our previous correspondence.

As a matter of integrity, please confirm if/when you CONFIRM the results from this repository as well, as I'm assuming that you have an interest in the truth as opposed to "who's is better" kind of metality. Thanks.

@BBC-Esq
Copy link

BBC-Esq commented Feb 21, 2024

@sanchit-gandhi two weeks and not confirmation about the results? ...Hmm...is your interest really in "verifying" the accuracy of the results or promoting Huggingface's ~thousands of stars for the "insanely" insane,. absolutely insane and pure insanity...that is...totally insane! Whisper. C'mon man. Admit the metrics and let's move forward.

I also like how you tried to recruit this fellow to come work with Huggingface. Sheesh. Will the egos never stop...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants