Translation Keeps randomly outputting a word #82

onlygary · 2024-04-06T03:35:54Z

I am using the Whisper Medium EN model, translating from ENG to Japanese.

The translation when speaking seems to work fine, but when I stop talking (even when the mic is off) , it keeps outputting "レイアウト" randomly every few seconds. Tried this with other languages too and it just keeps outputting one random word.

No idea why this happens. It stops if I turn translation off

royshil · 2024-04-10T11:11:08Z

@onlygary
This is caused by noise that the model thinks is speech but isn't and it assumes it's saying thank you
Add a noise cancellation filter before (above) the localvocal filter

In the new version coming out I also have a built in VAD that will improve this situation

royshil · 2024-04-23T20:58:10Z

@onlygary can you please test #92 ?

royshil · 2024-04-30T17:17:07Z

this is fixed in #95

takoyaro · 2024-05-10T02:42:46Z

Came here to say that I'm also experiencing this issue.
For context, I have tried muting my mic entirely but unfortunately the issue still arises as the random output is clearly coming from an empty input passed to the translator:
[obs-localvocal] Translation: '' -> 'レイアウト

I'd love to contribute but my C++ knowledge hits its limits here.

11:36:04.341: [obs-localvocal] found 576000 bytes, 144000 frames in input buffer, need >= 576000, processing
11:36:04.341: [obs-localvocal] with 144000 remaining to full segment, popped 144000 info-frames, pushing at 0 (overlap)
11:36:04.341: [obs-localvocal] first segment, no overlap exists, 144000 frames to process
11:36:04.341: [obs-localvocal] processing 144000 frames (3000 ms), start timestamp 1025374744210200 
11:36:04.341: [obs-localvocal] 2 channels, 48000 frames, 3000.000000 ms
11:36:04.347: [obs-localvocal] VAD detected no speech in 48000 frames
11:36:04.347: [obs-localvocal] skipping inference
11:36:04.347: [obs-localvocal] Translating text. __en__ -> __ja__
11:36:04.501: [obs-localvocal] audio processing of 0 ms data took 159 ms

I feel like if VAD detected no speech in[...] happens, the empty output shouldn't be sent to the translation model.
You already have that implemented for speech inference as seen in the logs so that might be an easy win:
[obs-localvocal] skipping inference

royshil · 2024-05-10T12:16:14Z

Came here to say that I'm also experiencing this issue. For context, I have tried muting my mic entirely but unfortunately the issue still arises as the random output is clearly coming from an empty input passed to the translator: [obs-localvocal] Translation: '' -> 'レイアウト

I'd love to contribute but my C++ knowledge hits its limits here.
11:36:04.341: [obs-localvocal] found 576000 bytes, 144000 frames in input buffer, need >= 576000, processing
11:36:04.341: [obs-localvocal] with 144000 remaining to full segment, popped 144000 info-frames, pushing at 0 (overlap)
11:36:04.341: [obs-localvocal] first segment, no overlap exists, 144000 frames to process
11:36:04.341: [obs-localvocal] processing 144000 frames (3000 ms), start timestamp 1025374744210200 
11:36:04.341: [obs-localvocal] 2 channels, 48000 frames, 3000.000000 ms
11:36:04.347: [obs-localvocal] VAD detected no speech in 48000 frames
11:36:04.347: [obs-localvocal] skipping inference
11:36:04.347: [obs-localvocal] Translating text. __en__ -> __ja__
11:36:04.501: [obs-localvocal] audio processing of 0 ms data took 159 ms
I feel like if VAD detected no speech in[...] happens, the empty output shouldn't be sent to the translation model. You already have that implemented for speech inference as seen in the logs so that might be an easy win: [obs-localvocal] skipping inference

yep this is fixed on master but not yet in a released version, which is coming shortly

royshil · 2024-05-14T01:33:21Z

@takoyaro @onlygary can you test the latest version?

takoyaro · 2024-05-14T03:33:09Z

@takoyaro @onlygary can you test the latest version?

Issue is fixed on my end. Thank you Roy!

royshil · 2024-06-06T18:29:07Z

closing this, resolved

royshil closed this as completed Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation Keeps randomly outputting a word #82

Translation Keeps randomly outputting a word #82

onlygary commented Apr 6, 2024

royshil commented Apr 10, 2024

royshil commented Apr 23, 2024

royshil commented Apr 30, 2024

takoyaro commented May 10, 2024

royshil commented May 10, 2024

royshil commented May 14, 2024

takoyaro commented May 14, 2024

royshil commented Jun 6, 2024

Translation Keeps randomly outputting a word #82

Translation Keeps randomly outputting a word #82

Comments

onlygary commented Apr 6, 2024

royshil commented Apr 10, 2024

royshil commented Apr 23, 2024

royshil commented Apr 30, 2024

takoyaro commented May 10, 2024

royshil commented May 10, 2024

royshil commented May 14, 2024

takoyaro commented May 14, 2024

royshil commented Jun 6, 2024