#

llm-inference

Here are 473 public repositories matching this topic...

lofcz / LlmTornado

One .NET library to consume OpenAI, Anthropic, Cohere, Google, Azure, Groq, and self-hosed APIs.

sdk chatbot openai sonnet cohere gpt-4 llm-inference gpt4-turbo anthropic-ai command-r-plus gpt4o

Updated Jul 19, 2024
C#

nomic-ai / gpt4all

GPT4All: Chat with Local LLMs on Any Device

Updated Jul 19, 2024
C++

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Jul 19, 2024
Python

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Jul 19, 2024
Python

hkust-nlp / dart-math

Official implementation for the paper *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*

nlp deep-learning mathematics llm llm-training llm-inference llm-evaluation

Updated Jul 19, 2024
Jupyter Notebook

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

gpu cuda pytorch tvm llm-inference flash-attention large-large-models

Updated Jul 19, 2024
Cuda

Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Jul 19, 2024
Python

JessonChan / chatalice

ChatAlice is a robust, cross-platform desktop application designed for MacOS, Windows, and Linux operating systems. It features support for API integration with major large language models (LLMs), notably ChatGPT, Claude, and others.

desktop-app openai chatgpt-app claude-ai llm-inference

Updated Jul 19, 2024
Vue

BentoML

bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jul 19, 2024
Python

deepsparse

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

Updated Jul 19, 2024
Python

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jul 19, 2024
Python

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Jul 19, 2024
C++

google / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gpu inference pytorch transformer llama gpt gemma model-serving tpu jax mlops large-language-models llm llmops llm-inference llama2

Updated Jul 18, 2024
Python

expectedparrot / edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

python open-source openai surveys experiments domain-specific-language market-research social-science synthetic-data data-labeling llm anthropic llm-agent llm-inference llama2 llm-framework mixtral deepinfra

Updated Jul 18, 2024
Python

dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference

Updated Jul 18, 2024
Python

ndurner / oai_chat

Multi-modal Chatbot based on OpenAI

chat chatbot openai gradio vlm gradio-interface gpt-4 llm vision-language-model llm-inference gpt-4v

Updated Jul 18, 2024
Python

google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated Jul 19, 2024
Python

armbues / SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jul 18, 2024
Python

microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated Jul 18, 2024
Jupyter Notebook

genai-impact / ecologits

🌱 EcoLogits tracks the energy consumption and environmental footprint of using generative AI models through APIs.

python sustainability green-ai green-software sustainable-ai llm generative-ai genai llm-inference

Updated Jul 18, 2024
Python

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."