Skip to content
Navigation Menu
Toggle navigation
Sign in
Product
Actions
Automate any workflow
Packages
Host and manage packages
Security
Find and fix vulnerabilities
Codespaces
Instant dev environments
GitHub Copilot
Write better code with AI
Code review
Manage code changes
Issues
Plan and track work
Discussions
Collaborate outside of code
Explore
All features
Documentation
GitHub Skills
Blog
Solutions
By size
Enterprise
Teams
Startups
By industry
Healthcare
Financial services
Manufacturing
By use case
CI/CD & Automation
DevOps
DevSecOps
Resources
Topics
AI
DevOps
Innersource
Open Source
Security
Software Development
Explore
Learning Pathways
White papers, Ebooks, Webinars
Customer Stories
Partners
Open Source
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories
Topics
Trending
Collections
Enterprise
Enterprise platform
AI-powered developer platform
Available add-ons
Advanced Security
Enterprise-grade security features
GitHub Copilot
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
Mozilla-Ocho
/
llamafile
Public
Notifications
You must be signed in to change notification settings
Fork
855
Star
17.1k
Code
Issues
100
Pull requests
3
Discussions
Actions
Projects
0
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights
Commits
Branch selector
main
User selector
jart
All time
Commit History
Commits on Jul 6, 2024
Make GGML asynchronously cancelable
jart
committed
Jul 6, 2024
b3930aa
Commits on Jul 5, 2024
Add support for JSON parameters to new server
jart
committed
Jul 5, 2024
d7c8e33
Commits on Jul 4, 2024
Revert "Disable warmup"
jart
committed
Jul 4, 2024
1601118
Disable warmup
jart
committed
Jul 4, 2024
21a30be
Commits on Jul 1, 2024
Release llamafile v0.8.9
jart
committed
Jul 1, 2024
cd84736
Make gemma2-27b-it the same as aistudio.google.com
jart
committed
Jul 1, 2024
af22695
Reclaim llama_decode() memory on cancelation
jart
committed
Jul 1, 2024
0d62d05
Remove ggml_context cache
jart
committed
Jul 1, 2024
617d841
Upgrade to Cosmopolitan v3.5.4
jart
committed
Jul 1, 2024
3af1ac0
Use float to string conversion
jart
committed
Jul 1, 2024
263d39b
Upgrade to Cosmopolitan v3.5.3
jart
committed
Jul 1, 2024
3fc00b1
Commits on Jun 30, 2024
Create /embedding endpoint in new server
jart
committed
Jun 30, 2024
1346ef4
Refactor new server and get leak checker working
jart
committed
Jun 30, 2024
46dda4f
Prevent vector overflow in llama.cpp
jart
committed
Jun 30, 2024
cd73243
Commits on Jun 29, 2024
Release llamafile v0.8.8
jart
committed
Jun 29, 2024
571b4e5
Support flash attention in --server mode
jart
committed
Jun 29, 2024
4aea606
Add Google Gemma v2 support
jart
committed
Jun 29, 2024
7692b85
Introduce --special flag
jart
committed
Jun 29, 2024
72fb8ca
Don't flush bf16 subnormals to zero
jart
committed
Jun 29, 2024
7fd9101
Commits on Jun 24, 2024
Release llamafile v0.8.7
jart
committed
Jun 24, 2024
b2f587c
Cut flash attention from CUDA again
jart
committed
Jun 24, 2024
4d1fde0
Fix server crash due to /dev/urandom
jart
committed
Jun 24, 2024
629e208
Upgrade to Cosmopolitan v3.5.1
jart
committed
Jun 24, 2024
0c0e72a
Pacify --temp flag when running in server mode
jart
committed
Jun 24, 2024
6d3590c
Commits on Jun 22, 2024
Always use tinyBLAS with AMD GPUs on Windows
jart
committed
Jun 22, 2024
60404a8
Commits on Jun 6, 2024
Add back missing build rule
jart
committed
Jun 6, 2024
842a421
Commits on Jun 5, 2024
Fix the build
jart
committed
Jun 5, 2024
1c08fad
Introduce new llamafile server
jart
committed
Jun 5, 2024
e0656ea
Make the build go a little faster
jart
committed
Jun 5, 2024
8b9be96
Add double-conversion
jart
committed
Jun 5, 2024
581a173
Improve CPU brand detection
jart
committed
Jun 5, 2024
e973fa2
Add stable-diffusion.cpp
jart
committed
Jun 5, 2024
3b7b1e3
Commits on May 25, 2024
Release llamafile v0.8.6
jart
committed
May 25, 2024
81cfbcf
Upgrade to Cosmopolitan v3.3.8
jart
committed
May 25, 2024
866a129
Don't print special tokens for now
jart
committed
May 25, 2024
69c2dd3
Pagination
Previous
Next
You can’t perform that action at this time.