A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jul 19, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Standardized Serverless ML Inference Platform on Kubernetes
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
The simplest way to serve AI/ML models in production
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
A scalable inference server for models optimized with OpenVINO™
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
Tools for easing the handoff between AI/ML and App/SRE teams.
Fast, easy and cost-efficient multi-LLM serving.
A scalable, high-performance serving system for federated learning models
🏕️ Reproducible development environment
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Add a description, image, and links to the model-serving topic page so that developers can more easily learn about it.
To associate your repository with the model-serving topic, visit your repo's landing page and select "manage topics."