You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WhisperS2T now supports the TensorRT-LLM backend, achieving double the inference speed compared to the CTranslate2 backend! The current optimal configuration on an A30 GPU achieves transcription of 1-hour files in approximately 18 seconds.
After TensorRT-LLM optimization, the exported model only works on NVIDIA GPUs with the same cuda_compute_capability. This means a model exported on a T4 GPU won't work on an A100, and vice versa.
Help Needed: Model export takes about 3-6 minutes. I need volunteers out there to export the model for a specific GPU and share it. It would be a huge help to the community! I have access to A30, A100, and T4 GPUs for which I will add the exported models.
PS: I will update this discussion in a few weeks on how to contribute your exported model.
Thanks,
Shashi
The text was updated successfully, but these errors were encountered:
@shashikg It's a very nice and excited project. If you're not in a hurry, I will be available to help you after 20 days. One concern is that, with the updates of the tensorrt-llm cpp runtime, these exported models might become obsolete. It would be best to have a one-button script to handle this, so we can regularly generate the latest models and then upload them to Hugging Face: like this https://huggingface.co/csukuangfj/k2/tree/main/cuda with tensorrt-version-gpu-model-name.pt
Hi @yuekaizhang thanks for showing interest in this project!
If you're not in a hurry, I will be available to help you after 20 days.
Yeah no hurry.. I am planning to maintain this project for a longer duration!
It would be best to have a one-button script to handle this, so we can regularly generate the latest models and then upload them to Hugging Face.
Right suggestion. So at present exported models are cached locally in tmp directory (even this helps - basically if anyone uses the package again on the same device, it picks up the exported model from the cache directory). I am working on to generate a tarred format for the exported model, which can be used to load the model or share it. I will integrate the huggingface_hub client library to automate the upload process as well.
Hey everyone!
WhisperS2T now supports the TensorRT-LLM backend, achieving double the inference speed compared to the CTranslate2 backend! The current optimal configuration on an A30 GPU achieves transcription of 1-hour files in approximately 18 seconds.
After TensorRT-LLM optimization, the exported model only works on NVIDIA GPUs with the same
cuda_compute_capability
. This means a model exported on a T4 GPU won't work on an A100, and vice versa.Help Needed: Model export takes about 3-6 minutes. I need volunteers out there to export the model for a specific GPU and share it. It would be a huge help to the community! I have access to A30, A100, and T4 GPUs for which I will add the exported models.
PS: I will update this discussion in a few weeks on how to contribute your exported model.
Thanks,
Shashi
The text was updated successfully, but these errors were encountered: