Commits

Commits on Jul 6, 2024

Make GGML asynchronously cancelable
jart
committedJul 6, 2024

Commits on Jul 5, 2024

Add support for JSON parameters to new server
jart
committedJul 5, 2024

Commits on Jul 4, 2024

Revert "Disable warmup"
jart
committedJul 4, 2024
Disable warmup
jart
committedJul 4, 2024

Commits on Jul 1, 2024

Release llamafile v0.8.9
jart
committedJul 1, 2024
Make gemma2-27b-it the same as aistudio.google.com
jart
committedJul 1, 2024
Reclaim llama_decode() memory on cancelation
jart
committedJul 1, 2024
Remove ggml_context cache
jart
committedJul 1, 2024
Upgrade to Cosmopolitan v3.5.4
jart
committedJul 1, 2024
Use float to string conversion
jart
committedJul 1, 2024
Upgrade to Cosmopolitan v3.5.3
jart
committedJul 1, 2024

Commits on Jun 30, 2024

Create /embedding endpoint in new server
jart
committedJun 30, 2024
Refactor new server and get leak checker working
jart
committedJun 30, 2024
Prevent vector overflow in llama.cpp
jart
committedJun 30, 2024

Commits on Jun 29, 2024

Release llamafile v0.8.8
jart
committedJun 29, 2024
Support flash attention in --server mode
jart
committedJun 29, 2024
Add Google Gemma v2 support
jart
committedJun 29, 2024
Introduce --special flag
jart
committedJun 29, 2024
Don't flush bf16 subnormals to zero
jart
committedJun 29, 2024

Commits on Jun 24, 2024

Release llamafile v0.8.7
jart
committedJun 24, 2024
Cut flash attention from CUDA again
jart
committedJun 24, 2024
Fix server crash due to /dev/urandom
jart
committedJun 24, 2024
Upgrade to Cosmopolitan v3.5.1
jart
committedJun 24, 2024
Pacify --temp flag when running in server mode
jart
committedJun 24, 2024

Commits on Jun 22, 2024

Always use tinyBLAS with AMD GPUs on Windows
jart
committedJun 22, 2024

Commits on Jun 6, 2024

Add back missing build rule
jart
committedJun 6, 2024

Commits on Jun 5, 2024

Fix the build
jart
committedJun 5, 2024
Introduce new llamafile server
jart
committedJun 5, 2024
Make the build go a little faster
jart
committedJun 5, 2024
Add double-conversion
jart
committedJun 5, 2024
Improve CPU brand detection
jart
committedJun 5, 2024
Add stable-diffusion.cpp
jart
committedJun 5, 2024

Commits on May 25, 2024

Release llamafile v0.8.6
jart
committedMay 25, 2024
Upgrade to Cosmopolitan v3.3.8
jart
committedMay 25, 2024
Don't print special tokens for now
jart
committedMay 25, 2024