Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT-LLM Backend Exported Model #8

Open
shashikg opened this issue Jan 28, 2024 · 2 comments
Open

TensorRT-LLM Backend Exported Model #8

shashikg opened this issue Jan 28, 2024 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@shashikg
Copy link
Owner

Hey everyone!

WhisperS2T now supports the TensorRT-LLM backend, achieving double the inference speed compared to the CTranslate2 backend! The current optimal configuration on an A30 GPU achieves transcription of 1-hour files in approximately 18 seconds.

After TensorRT-LLM optimization, the exported model only works on NVIDIA GPUs with the same cuda_compute_capability. This means a model exported on a T4 GPU won't work on an A100, and vice versa.

Help Needed: Model export takes about 3-6 minutes. I need volunteers out there to export the model for a specific GPU and share it. It would be a huge help to the community! I have access to A30, A100, and T4 GPUs for which I will add the exported models.

PS: I will update this discussion in a few weeks on how to contribute your exported model.

Thanks,
Shashi

@shashikg shashikg added the help wanted Extra attention is needed label Jan 28, 2024
@yuekaizhang
Copy link

@shashikg It's a very nice and excited project. If you're not in a hurry, I will be available to help you after 20 days. One concern is that, with the updates of the tensorrt-llm cpp runtime, these exported models might become obsolete. It would be best to have a one-button script to handle this, so we can regularly generate the latest models and then upload them to Hugging Face: like this https://huggingface.co/csukuangfj/k2/tree/main/cuda with tensorrt-version-gpu-model-name.pt

@shashikg
Copy link
Owner Author

shashikg commented Jan 31, 2024

Hi @yuekaizhang thanks for showing interest in this project!

If you're not in a hurry, I will be available to help you after 20 days.

Yeah no hurry.. I am planning to maintain this project for a longer duration!

It would be best to have a one-button script to handle this, so we can regularly generate the latest models and then upload them to Hugging Face.

Right suggestion. So at present exported models are cached locally in tmp directory (even this helps - basically if anyone uses the package again on the same device, it picks up the exported model from the cache directory). I am working on to generate a tarred format for the exported model, which can be used to load the model or share it. I will integrate the huggingface_hub client library to automate the upload process as well.

I will ping here once everything get's ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
2 participants