Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion to deal with omission of periods #9

Open
busdriverbuddha opened this issue Jan 31, 2024 · 7 comments
Open

Suggestion to deal with omission of periods #9

busdriverbuddha opened this issue Jan 31, 2024 · 7 comments

Comments

@busdriverbuddha
Copy link

There is a frequent hallucination in Whisper in which segments of the transcript are stripped of a period or full stop. Example (not a real transcription, just to illustrate the issue:

Meghan Elizabeth Trainor is an American singer-songwriter and television personality She rose to prominence after signing with Epic Records in 2014 and releasing her debut single All About That Bass, which reached number one on the U.S. Billboard Hot 100 chart and sold 11 million copies worldwide Trainor has released five studio albums with the label and has received various accolades, including the 2016 Grammy Award for Best New Artist.

I have found that adding about 5 seconds of whitenoise to the beginning of the affected excerpt and retranscribing it usually corrects the punctuation.

Perhaps this could be incorporated to the code. Or, if there were a way to separate the affected region (e.g. with information from the VAD), a separate function could be written to check for this hallucination, export the WAV for the affected region and retranscribe.

@shashikg
Copy link
Owner

shashikg commented Jan 31, 2024

Hi @busdriverbuddha

Possibly these are some of the failure modes of whisper's LLM based decoder.

I have found that adding about 5 seconds of whitenoise to the beginning of the affected excerpt and retranscribing it usually corrects the punctuation.

Interesting, do you have any detailed evaluation on it? Like how much punctuation accuracy improves after adding this? Also any effects on the WER? One issue I see in this approach is that it will unnecessarily increase the inference time.

Can you try this and check if it helps?

files = ['audio.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = ['This is a documentary about Meghan Elizabeth.']

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=32)

PS: This thing is also in my roadmaps on how to use prompting with whisper model to align the transcription format.

@shashikg
Copy link
Owner

If you can provide me one sample file, I can try looking into it if VAD margins can be somehow used to improve these issues.

@busdriverbuddha
Copy link
Author

I can supply an MP3 file in which this issue happens predictably. How can I share it with you privately?

@shashikg
Copy link
Owner

shashikg commented Feb 1, 2024

You can email me: shashikg.iitk@gmail.com

@busdriverbuddha
Copy link
Author

busdriverbuddha commented Feb 1, 2024 via email

@shashikg
Copy link
Owner

shashikg commented Feb 6, 2024

Hi got your email. I will get back to you by coming weekend.

@busdriverbuddha
Copy link
Author

busdriverbuddha commented Feb 6, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants