Skip to content
cyberofficial edited this page Jul 8, 2024 · 5 revisions

Wiki is Work in progress! There will be errors and details I missed or did errors with.

Synthalingua Wiki

Welcome to the Synthalingua wiki! Here you'll find detailed information on how to use and troubleshoot Synthalingua, a powerful AI-powered real-time audio translation tool.

Table of Contents

Getting Started

System Requirements

Synthalingua requires a system that meets the following minimum requirements:

Requirement Minimum Moderate Recommended Best Performance
CPU Cores 2 6 8 16
CPU Clock Speed (GHz) 2.5 or higher 3.0 or higher 3.5 or higher 4.0 or higher
RAM (GB) 4 or higher 8 or higher 16 or higher 16 or higher
GPU VRAM (GB) 2 or higher 6 or higher 8 or higher 12 or higher
Free Disk Space (GB) 10 or higher 10 or higher 10 or higher 10 or higher
GPU (suggested) Nvidia GTX 1050 or higher Nvidia GTX 1660 or higher Nvidia RTX 3070 or higher Nvidia RTX 3090 or higher

Notes:

  • Nvidia GPU support on Linux and Windows
  • Nvidia GPU is suggested but not required.
  • AMD GPUs are supported on Linux, not Windows.
  • A microphone is optional. You can use the --stream flag to stream audio from a HLS stream.

Installation

  1. Install Python: Download and install Python 3.10.9. Ensure you select the "Add Python to PATH" option during installation.
  2. Install Git: Download and install Git. Using default settings is recommended.
  3. Install FFMPEG: Follow the instructions provided here to install FFMPEG.
  4. Install CUDA (Optional): If you plan to utilize your Nvidia GPU, download and install CUDA from here.
  5. Run Setup Script:
    • On Windows: Execute the setup.bat file.
    • On Linux: Execute the setup.bash file. Ensure you have gcc and portaudio19-dev (or portaudio-devel for some systems) installed.
  6. Run Synthalingua: Execute the newly created batch file or bash script. You can modify this file to customize the settings.

Usage

Command Line Arguments

Synthalingua utilizes command line arguments to configure its behavior. Below is a table detailing the available arguments:

Flag Description
--ram Specify the amount of RAM to allocate. Default: 4GB. Options: "1GB", "2GB", "4GB", "6GB", "12GB".
--ramforce Force the script to use the specified VRAM amount. Caution: May lead to crashes if insufficient VRAM is available.
--energy_threshold Set the microphone's audio detection sensitivity. Default: 100. Range: 1-1000 (higher values decrease sensitivity).
--mic_calibration_time Duration in seconds for microphone calibration. Set to 0 to skip user input and use the default 5 seconds.
--record_timeout Real-time recording duration in seconds. Default: 2 seconds.
--phrase_timeout Silence duration in seconds between recordings before considering it a new line. Default: 1 second.
--translate Enable translation of transcriptions to English.
--transcribe Enable transcription of audio to the specified target language. Requires the --target_language flag.
--target_language Specify the target language for translation or transcription. Use ISO 639-1 language codes or their English names.
--language Specify the source language for translation. Use ISO 639-1 language codes or their English names.
--auto_model_swap Enable automatic model switching based on the detected language.
--device Select the processing unit for the model. Default: "cuda" (if available). Options: "cpu", "cuda".
--cuda_device Specify the CUDA device ID to utilize. Default: 0.
--discord_webhook Set the Discord webhook URL to receive transcriptions.
--list_microphones Display a list of available microphones and exit.
--set_microphone Set the default microphone using its name or ID from the list generated by --list_microphones.
--microphone_enabled Enable or disable microphone usage. Use true or false after the flag.
--auto_language_lock Automatically lock the language after 5 detections based on the detected language. Improves latency.
--use_finetune Utilize the fine-tuned model for increased accuracy (at the cost of higher latency and resource usage).
--no_log Display only the most recent translation/transcription instead of a log-style output.
--updatebranch Specify the repository branch to check for updates. Default: "master". Options: "master", "dev-testing", "bleeding-under-work", "disable".
--keep_temp Retain audio files in the "out" folder. Note: This will consume storage space over time.
--portnumber Set the port number for the web server. If not specified, the web server will not start.
--retry Enable retrying translations and transcriptions in case of failures.
--about Display information about the application.
--save_transcript Enable saving the transcript to a text file.
--save_folder Specify the folder to save the transcript to.
--stream Stream audio from a specified HLS stream URL.
--stream_language Specify the language of the audio stream. Default: English.
--stream_target_language Specify the target language for stream translation or transcription. Default: English.
--stream_translate Enable translation of the audio stream.
--stream_transcribe Enable transcription of the audio stream to the specified target language.
--stream_original_text Display the detected original text from the stream.
--stream_chunks Specify the number of chunks to split the stream into. Default: 5 (recommended range: 3-5 for most streams, 1-2 for YouTube, 5-10 for Twitch).
--cookies Specify the filename of the cookies file (without extension) located in the "cookies" folder.
--makecaptions Enable caption generation mode. Requires --file_input, --file_output, and --file_output_name flags.
--file_input Specify the path to the input audio/video file for caption generation.
--file_output Specify the folder to save the generated captions to.
--file_output_name Specify the filename for the generated captions (without extension).
--ignorelist Specify the path to a text file containing a list of words or phrases to ignore.
--condition_on_previous_text Enable conditioning the model on previous text to reduce repetition (may impact speed).
--remote_hls_password_id Specify the password ID for accessing password-protected HLS streams. Default: "key".
--remote_hls_password Specify the password for accessing password-protected HLS streams.

Examples

Caption Generation:

python transcribe_audio.py --ram 12gb --makecaptions --file_input="C:\Users\username\Downloads\video.mp4" --file_output="C:\Users\username\Downloads" --file_output_name="captions" --language Japanese --device cuda

Live Stream Translation:

python transcribe_audio.py --ram 12gb --stream_translate --stream_language Japanese --stream https://www.twitch.tv/somestreamerhere

Discord Integration:

python transcribe_audio.py --ram 6gb --translate --language ja --discord_webhook "https://discord.com/api/webhooks/1234567890/1234567890" --energy_threshold 300

Setting Microphone:

  1. List microphones: python transcribe_audio.py --list_microphones
  2. Set microphone: python transcribe_audio.py --set_microphone "Microphone Name" or python transcribe_audio.py --set_microphone 2 (using index)

Web Server

Start the web server using the --portnumber flag:

python transcribe_audio.py --portnumber 4000

Access the web interface at http://localhost:4000. Use query parameters to control element visibility:

  • ?showoriginal: Show original detected text.
  • ?showtranslation: Show translated text.
  • ?showtranscription: Show transcribed text.

Word Block List

Use the --ignorelist flag to specify a text file containing words or phrases to exclude from the output:

python transcribe_audio.py --ignorelist "C:\path\to\wordlist.txt"

Cookies

Place cookie files in the "cookies" folder in Netscape format (.txt). Use the --cookies flag to specify the filename without the extension:

python transcribe_audio.py --cookies twitchacc1

Troubleshooting

Refer to the Troubleshooting section in the main README for solutions to common issues.

Additional Information

  • Models: Synthalingua utilizes fine-tuned models based on OpenAI's Whisper.
  • Support: For assistance or to report issues, please create an issue on the GitHub repository.

Contributing

We welcome contributions to Synthalingua! Please refer to the Contribution Guidelines for information on how to contribute.