ONNX Runtime Generative AI

Run generative AI models with ONNX Runtime.

This library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.

Users can call a high level generate() method, or run each iteration of the model in a loop.

Support greedy/beam search and TopP, TopK sampling to generate token sequences
Built in logits processing like repetition penalties
Easy custom scoring

Features

Supported model architectures:
- Gemma
- LLaMA
- Mistral
- Phi-2
Supported targets:
- CPU
- GPU (CUDA)
Supported sampling features
- Beam search
- Greedy search
- Top P/Top K
APIs
- Python
- C#
- C/C++

Coming very soon

Support for the Whisper model architectures
Support for DirectML

Roadmap

Automatic model download and cache
More model architectures

Sample code for phi-2 in Python

Install onnxruntime-genai.

(Temporary) Build and install from source according to the instructions below.

import onnxruntime_genai as og

model = og.Model(f'models/microsoft/phi-2', device_type)

tokenizer = og.Tokenizer(model)

prompt = '''def print_prime(n):
    """
    Print all primes between 1 and n
    """'''

tokens = tokenizer.encode(prompt)

params = og.SearchParams(model)
params.set_search_options({"max_length":200})
params.input_ids = tokens

output_tokens = model.generate(params)

text = tokenizer.decode(output_tokens)

print("Output:")
print(text)

Build from source

This step requires cmake to be installed.

Clone this repo

git clone https://github.com/microsoft/onnxruntime-genai
cd onnxruntime-genai

Install ONNX Runtime

By default, the onnxruntime-genai build expects to find the ONNX Runtime include and binaries in a folder called ort in the root directory of onnxruntime-genai. You can put the ONNX Runtime files in a different location and specify this location to the onnxruntime-genai build. These instructions use ORT_HOME as the location.

Install from release

These instructions are for the Linux GPU build of ONNX Runtime. Replace the location with the operating system and target of choice.

cd $ORT_HOME
wget https://github.com/microsoft/onnxruntime/releases/download/v1.17.1/onnxruntime-linux-x64-gpu-1.17.1.tgz
tar xvzf onnxruntime-linux-x64-gpu-1.17.1.tgz 
mv onnxruntime-linux-x64-gpu-1.17.1/include .
mv onnxruntime-linux-x64-gpu-1.17.1/lib .

Or build from source

git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime

Create include and lib folders in the ORT_HOME directory

mkdir $ORT_HOME/include
mkdir $ORT_HOME/lib

Build from source and copy the include and libraries into ORT_HOME

On Windows

build.bat --config RelWithDebInfo --build_shared_lib --skip_tests --parallel [--use_cuda]
copy include\onnxruntime\core\session\onnxruntime_c_api.h $ORT_HOME\include
copy build\Windows\RelWithDebInfo\RelWithDebInfo\*.dll $ORT_HOME\lib

On Linux

./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda]
cp include/onnxruntime/core/session/onnxruntime_c_api.h $ORT_HOME/include
cp build/Linux/RelWithDebInfo/libonnxruntime*.so* $ORT_HOME/lib

Build onnxruntime-genai

If you are building for CUDA, add the cuda_home argument.
```
cd ..
python build.py [--cuda_home <path_to_cuda_home>]
```
Install Python wheel
```
cd build/wheel
pip install *.whl
```

Model download and export

ONNX models are run from a local folder, via a string supplied to the Model() method.

To source microsoft/phi-2 optimized for your target, download and run the following script. You will need to be logged into Hugging Face via the CLI to run the script.

Install model builder dependencies.

pip install numpy
pip install transformers
pip install torch
pip install onnx
pip install onnxruntime

Export int4 CPU version

huggingface-cli login --token <your HuggingFace token>
python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -p int4 -e cpu -o <model folder>

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 493 Commits
.config		.config
.github/workflows		.github/workflows
.pipelines		.pipelines
benchmark/python		benchmark/python
cgmanifests		cgmanifests
cmake		cmake
examples		examples
nuget		nuget
src		src
test		test
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
ThirdPartyNotices.txt		ThirdPartyNotices.txt
VERSION_INFO		VERSION_INFO
build.py		build.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ONNX Runtime Generative AI

Features

Coming very soon

Roadmap

Sample code for phi-2 in Python

Build from source

Model download and export

Contributing

Trademarks

About

Releases

Packages

Languages

License

shadiwodi/onnxruntime-genai

Folders and files

Latest commit

History

Repository files navigation

ONNX Runtime Generative AI

Features

Coming very soon

Roadmap

Sample code for phi-2 in Python

Build from source

Model download and export

Contributing

Trademarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages