Skip to content

⚡️ 80x faster language detection with Fasttext | Split text by language for TTS

License

Notifications You must be signed in to change notification settings

LlmKira/fast-langdetect

Repository files navigation

fast-langdetect 🚀

PyPI version Downloads Downloads

Overview

fast-langdetect provides ultra-fast and highly accurate language detection based on FastText, a library developed by Facebook. This package is 80x faster than traditional methods and offers 95% accuracy.

It supports Python versions 3.9 to 3.12.

This project builds upon zafercavdar/fasttext-langdetect with enhancements in packaging.

For more information on the underlying FastText model, refer to the official documentation: FastText Language Identification.

Note

This library requires over 200MB of memory to use in low memory mode.

Installation 💻

To install fast-langdetect, you can use either pip or pdm:

Using pip

pip install fast-langdetect

Using pdm

pdm add fast-langdetect

Usage 🖥️

For optimal performance and accuracy in language detection, use detect(text, low_memory=False) to load the larger model.

The model will be downloaded to the /tmp/fasttext-langdetect directory upon first use.

Native API (Recommended)

from fast_langdetect import detect, detect_multilingual

# Single language detection
print(detect("Hello, world!"))
# Output: {'lang': 'en', 'score': 0.1520957201719284}

print(detect("Привет, мир!")["lang"])
# Output: ru

# Multi-language detection
print(detect_multilingual("Hello, world!你好世界!Привет, мир!"))
# Output: [
#     {'lang': 'ru', 'score': 0.39008623361587524},
#     {'lang': 'zh', 'score': 0.18235979974269867},
# ]

Convenient detect_language Function

from fast_langdetect import detect_language

# Single language detection
print(detect_language("Hello, world!"))
# Output: EN

print(detect_language("Привет, мир!"))
# Output: RU

print(detect_language("你好,世界!"))
# Output: ZH

Splitting Text by Language 🌐

For text splitting based on language, please refer to the split-lang repository.

Accuracy 🎯

For detailed benchmark results, refer to zafercavdar/fasttext-langdetect#benchmark.