Home

Wiki is Work in progress! There will be errors and details I missed or did errors with.

Synthalingua Wiki

Welcome to the Synthalingua wiki! Here you'll find detailed information on how to use and troubleshoot Synthalingua, a powerful AI-powered real-time audio translation tool.

Table of Contents

Getting Started

System Requirements

Synthalingua requires a system that meets the following minimum requirements:

Requirement	Minimum	Moderate	Recommended	Best Performance
CPU Cores	2	6	8	16
CPU Clock Speed (GHz)	2.5 or higher	3.0 or higher	3.5 or higher	4.0 or higher
RAM (GB)	4 or higher	8 or higher	16 or higher	16 or higher
GPU VRAM (GB)	2 or higher	6 or higher	8 or higher	12 or higher
Free Disk Space (GB)	10 or higher	10 or higher	10 or higher	10 or higher
GPU (suggested)	Nvidia GTX 1050 or higher	Nvidia GTX 1660 or higher	Nvidia RTX 3070 or higher	Nvidia RTX 3090 or higher

Notes:

Nvidia GPU support on Linux and Windows
Nvidia GPU is suggested but not required.
AMD GPUs are supported on Linux, not Windows.
A microphone is optional. You can use the --stream flag to stream audio from a HLS stream.

Installation

Install Python: Download and install Python 3.10.9. Ensure you select the "Add Python to PATH" option during installation.
Install Git: Download and install Git. Using default settings is recommended.
Install FFMPEG: Follow the instructions provided here to install FFMPEG.
Install CUDA (Optional): If you plan to utilize your Nvidia GPU, download and install CUDA from here.
Run Setup Script:
- On Windows: Execute the setup.bat file.
- On Linux: Execute the setup.bash file. Ensure you have gcc and portaudio19-dev (or portaudio-devel for some systems) installed.
Run Synthalingua: Execute the newly created batch file or bash script. You can modify this file to customize the settings.

Usage

Command Line Arguments

Synthalingua utilizes command line arguments to configure its behavior. Below is a table detailing the available arguments:

Flag	Description
`--ram`	Specify the amount of RAM to allocate. Default: 4GB. Options: "1GB", "2GB", "4GB", "6GB", "12GB".
`--ramforce`	Force the script to use the specified VRAM amount. Caution: May lead to crashes if insufficient VRAM is available.
`--energy_threshold`	Set the microphone's audio detection sensitivity. Default: 100. Range: 1-1000 (higher values decrease sensitivity).
`--mic_calibration_time`	Duration in seconds for microphone calibration. Set to 0 to skip user input and use the default 5 seconds.
`--record_timeout`	Real-time recording duration in seconds. Default: 2 seconds.
`--phrase_timeout`	Silence duration in seconds between recordings before considering it a new line. Default: 1 second.
`--translate`	Enable translation of transcriptions to English.
`--transcribe`	Enable transcription of audio to the specified target language. Requires the `--target_language` flag.
`--target_language`	Specify the target language for translation or transcription. Use ISO 639-1 language codes or their English names.
`--language`	Specify the source language for translation. Use ISO 639-1 language codes or their English names.
`--auto_model_swap`	Enable automatic model switching based on the detected language.
`--device`	Select the processing unit for the model. Default: "cuda" (if available). Options: "cpu", "cuda".
`--cuda_device`	Specify the CUDA device ID to utilize. Default: 0.
`--discord_webhook`	Set the Discord webhook URL to receive transcriptions.
`--list_microphones`	Display a list of available microphones and exit.
`--set_microphone`	Set the default microphone using its name or ID from the list generated by `--list_microphones`.
`--microphone_enabled`	Enable or disable microphone usage. Use `true` or `false` after the flag.
`--auto_language_lock`	Automatically lock the language after 5 detections based on the detected language. Improves latency.
`--use_finetune`	Utilize the fine-tuned model for increased accuracy (at the cost of higher latency and resource usage).
`--no_log`	Display only the most recent translation/transcription instead of a log-style output.
`--updatebranch`	Specify the repository branch to check for updates. Default: "master". Options: "master", "dev-testing", "bleeding-under-work", "disable".
`--keep_temp`	Retain audio files in the "out" folder. Note: This will consume storage space over time.
`--portnumber`	Set the port number for the web server. If not specified, the web server will not start.
`--retry`	Enable retrying translations and transcriptions in case of failures.
`--about`	Display information about the application.
`--save_transcript`	Enable saving the transcript to a text file.
`--save_folder`	Specify the folder to save the transcript to.
`--stream`	Stream audio from a specified HLS stream URL.
`--stream_language`	Specify the language of the audio stream. Default: English.
`--stream_target_language`	Specify the target language for stream translation or transcription. Default: English.
`--stream_translate`	Enable translation of the audio stream.
`--stream_transcribe`	Enable transcription of the audio stream to the specified target language.
`--stream_original_text`	Display the detected original text from the stream.
`--stream_chunks`	Specify the number of chunks to split the stream into. Default: 5 (recommended range: 3-5 for most streams, 1-2 for YouTube, 5-10 for Twitch).
`--cookies`	Specify the filename of the cookies file (without extension) located in the "cookies" folder.
`--makecaptions`	Enable caption generation mode. Requires `--file_input`, `--file_output`, and `--file_output_name` flags.
`--file_input`	Specify the path to the input audio/video file for caption generation.
`--file_output`	Specify the folder to save the generated captions to.
`--file_output_name`	Specify the filename for the generated captions (without extension).
`--ignorelist`	Specify the path to a text file containing a list of words or phrases to ignore.
`--condition_on_previous_text`	Enable conditioning the model on previous text to reduce repetition (may impact speed).
`--remote_hls_password_id`	Specify the password ID for accessing password-protected HLS streams. Default: "key".
`--remote_hls_password`	Specify the password for accessing password-protected HLS streams.

Examples

Caption Generation:

python transcribe_audio.py --ram 12gb --makecaptions --file_input="C:\Users\username\Downloads\video.mp4" --file_output="C:\Users\username\Downloads" --file_output_name="captions" --language Japanese --device cuda

Live Stream Translation:

python transcribe_audio.py --ram 12gb --stream_translate --stream_language Japanese --stream https://www.twitch.tv/somestreamerhere

Discord Integration:

python transcribe_audio.py --ram 6gb --translate --language ja --discord_webhook "https://discord.com/api/webhooks/1234567890/1234567890" --energy_threshold 300

Setting Microphone:

List microphones: python transcribe_audio.py --list_microphones
Set microphone: python transcribe_audio.py --set_microphone "Microphone Name" or python transcribe_audio.py --set_microphone 2 (using index)

Web Server

Start the web server using the --portnumber flag:

python transcribe_audio.py --portnumber 4000

Access the web interface at http://localhost:4000. Use query parameters to control element visibility:

?showoriginal: Show original detected text.
?showtranslation: Show translated text.
?showtranscription: Show transcribed text.

Word Block List

Use the --ignorelist flag to specify a text file containing words or phrases to exclude from the output:

python transcribe_audio.py --ignorelist "C:\path\to\wordlist.txt"

Cookies

Place cookie files in the "cookies" folder in Netscape format (.txt). Use the --cookies flag to specify the filename without the extension:

python transcribe_audio.py --cookies twitchacc1

Troubleshooting

Refer to the Troubleshooting section in the main README for solutions to common issues.

Additional Information

Models: Synthalingua utilizes fine-tuned models based on OpenAI's Whisper.
Support: For assistance or to report issues, please create an issue on the GitHub repository.

Contributing

We welcome contributions to Synthalingua! Please refer to the Contribution Guidelines for information on how to contribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly