#

model-serving

Here are 133 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops xpu llm inferentia llmops llm-serving trainium

Updated Jul 19, 2024
Python

intel / xFasterTransformer

intel inference transformer xeon llama model-serving llm chatglm qwen

Updated Jul 19, 2024
C++

google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated Jul 19, 2024
Python

kserve / kserve

Standardized Serverless ML Inference Platform on Kubernetes

Updated Jul 19, 2024
Python

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

nlp deep-learning llama gpt model-serving llm openai-triton

Updated Jul 19, 2024
Python

BentoML

bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jul 19, 2024
Python

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jul 19, 2024
Python

truss

basetenlabs / truss

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

Updated Jul 19, 2024
Python

google / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gpu inference pytorch transformer llama gpt gemma model-serving tpu jax mlops large-language-models llm llmops llm-inference llama2

Updated Jul 19, 2024
Python

mlrun / mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

python kubernetes workflow data-science machine-learning data-engineering model-serving mlops experiment-tracking mlops-workflow

Updated Jul 18, 2024
Python

openvinotoolkit / model_server

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated Jul 18, 2024
C++

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated Jul 18, 2024
C++

instill-ai / models

🧪 Samples for ⚗️ Instill Model

model-serving instill-model instill-core

Updated Jul 17, 2024

instill-ai / console

📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core

console ui computer-vision deep-learning frontend image-classification object-detection structured-data hacktoberfest data-pipeline no-code model-serving vdp unstructured-data data-connector vision-ai versatile-data-pipeline

Updated Jul 18, 2024
TypeScript

EmbeddedLLM / vllm-rocm

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

inference pytorch transformer gpt amdgpu rocm model-serving llm llm-inference

Updated Jul 18, 2024
Python

kitops

jozu-ai / kitops

Tools for easing the handoff between AI/ML and App/SRE teams.

kubernetes devops ai code tensorflow sklearn models ml pytorch datasets devops-tools kubernetes-deployment model-serving mlops model-packer model-interpretability gguf mlops-tools

Updated Jul 18, 2024
Go

ServerlessLLM

ServerlessLLM / ServerlessLLM

Fast, easy and cost-efficient multi-LLM serving.

cuda pytorch model-serving model-as-a-service huggingface-transformers large-language-models serverless-inference

Updated Jul 16, 2024
Python

FederatedAI / FATE-Serving

A scalable, high-performance serving system for federated learning models

monitor inference model-serving model-versioning federated-learning

Updated Jul 16, 2024
Java

tensorchord / envd

🏕️ Reproducible development environment

docker developer-tools development-environment hacktoberfest model-serving buildkit mlops mlops-workflow llmops

Updated Jul 15, 2024
Go

mosec

mosecorg / mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Jul 13, 2024
Python

Improve this page

Add a description, image, and links to the model-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-serving topic, visit your repo's landing page and select "manage topics."