fast-langdetect 🚀

Overview

fast-langdetect provides ultra-fast and highly accurate language detection based on FastText, a library developed by Facebook. This package is 80x faster than traditional methods and offers 95% accuracy.

It supports Python versions 3.9 to 3.12.

This project builds upon zafercavdar/fasttext-langdetect with enhancements in packaging.

For more information on the underlying FastText model, refer to the official documentation: FastText Language Identification.

Note

This library requires over 200MB of memory to use in low memory mode.

Installation 💻

To install fast-langdetect, you can use either pip or pdm:

Using pip

pip install fast-langdetect

Using pdm

pdm add fast-langdetect

Usage 🖥️

For optimal performance and accuracy in language detection, use detect(text, low_memory=False) to load the larger model.

The model will be downloaded to the /tmp/fasttext-langdetect directory upon first use.

Native API (Recommended)

from fast_langdetect import detect, detect_multilingual

# Single language detection
print(detect("Hello, world!"))
# Output: {'lang': 'en', 'score': 0.1520957201719284}

print(detect("Привет, мир!")["lang"])
# Output: ru

# Multi-language detection
print(detect_multilingual("Hello, world!你好世界!Привет, мир!"))
# Output: [
#     {'lang': 'ru', 'score': 0.39008623361587524},
#     {'lang': 'zh', 'score': 0.18235979974269867},
# ]

Convenient `detect_language` Function

from fast_langdetect import detect_language

# Single language detection
print(detect_language("Hello, world!"))
# Output: EN

print(detect_language("Привет, мир!"))
# Output: RU

print(detect_language("你好，世界！"))
# Output: ZH

Splitting Text by Language 🌐

For text splitting based on language, please refer to the split-lang repository.

Accuracy 🎯

For detailed benchmark results, refer to zafercavdar/fasttext-langdetect#benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
feature_test		feature_test
src/fast_langdetect		src/fast_langdetect
tests		tests
.gitignore		.gitignore
.nerve.toml		.nerve.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
NOTICE.MD		NOTICE.MD
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fast-langdetect 🚀

Overview

Installation 💻

Using pip

Using pdm

Usage 🖥️

Native API (Recommended)

Convenient `detect_language` Function

Splitting Text by Language 🌐

Accuracy 🎯

About

Releases 4

Packages

Languages

License

LlmKira/fast-langdetect

Folders and files

Latest commit

History

Repository files navigation

fast-langdetect 🚀

Overview

Installation 💻

Using pip

Using pdm

Usage 🖥️

Native API (Recommended)

Convenient detect_language Function

Splitting Text by Language 🌐

Accuracy 🎯

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Convenient `detect_language` Function

Packages