-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Insights: ggerganov/llama.cpp
Overview
Could not load contribution data
Please try again later
26 Releases published by 1 person
-
b3373
published
Jul 11, 2024 -
b3374
published
Jul 11, 2024 -
b3375
published
Jul 11, 2024 -
b3376
published
Jul 12, 2024 -
b3378
published
Jul 12, 2024 -
b3381
published
Jul 12, 2024 -
b3382
published
Jul 12, 2024 -
b3383
published
Jul 12, 2024 -
b3384
published
Jul 12, 2024 -
b3385
published
Jul 13, 2024 -
b3386
published
Jul 13, 2024 -
b3387
published
Jul 14, 2024 -
b3389
published
Jul 14, 2024 -
b3392
published
Jul 15, 2024 -
b3393
published
Jul 15, 2024 -
b3394
published
Jul 15, 2024 -
b3396
published
Jul 15, 2024 -
b3398
published
Jul 15, 2024 -
b3400
published
Jul 15, 2024 -
b3402
published
Jul 16, 2024 -
b3403
published
Jul 16, 2024 -
b3405
published
Jul 16, 2024 -
b3406
published
Jul 17, 2024 -
b3407
published
Jul 17, 2024 -
b3408
published
Jul 17, 2024 -
b3412
published
Jul 18, 2024
40 Pull requests merged by 22 people
-
server: use relative routes for static files in new UI
#8552 merged
Jul 18, 2024 -
convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor
#7499 merged
Jul 18, 2024 -
fix special not work for llama-server
#8553 merged
Jul 18, 2024 -
lookup: fibonacci hashing, fix crashes
#8548 merged
Jul 17, 2024 -
build : Fix docker build warnings (#8535)
#8537 merged
Jul 17, 2024 -
Update CONTRIBUTING.md to remove mention of noci
#8541 merged
Jul 17, 2024 -
[CANN] Add Ascend NPU backend
#6035 merged
Jul 17, 2024 -
batched: fix n_predict parameter
#8527 merged
Jul 17, 2024 -
llama : disable context-shift for DeepSeek v2
#8501 merged
Jul 17, 2024 -
make/cmake: add missing force MMQ/cuBLAS for HIP
#8515 merged
Jul 16, 2024 -
Update clib.json to point to Cyan4973 original xxhash repo
#8491 merged
Jul 16, 2024 -
handle export-lora help argument
#8497 merged
Jul 16, 2024 -
llama : valign + remove unused ftype
#8502 merged
Jul 16, 2024 -
convert_hf : faster lazy safetensors
#8482 merged
Jul 16, 2024 -
Refactor lora adapter support
#8332 merged
Jul 15, 2024 -
fix ci
#8494 merged
Jul 15, 2024 -
ggml : suppress unknown pragma 'GCC' on windows
#8460 merged
Jul 15, 2024 -
server: update README.md with llama-server --help's output [no ci]
#8472 merged
Jul 15, 2024 -
docs: fix links in development docs [no ci]
#8481 merged
Jul 15, 2024 -
[SYCL] add concat through dim 1/2
#8483 merged
Jul 15, 2024 -
Vulkan MMQ Fix
#8479 merged
Jul 15, 2024 -
pydantic : replace uses of __annotations__ with get_type_hints
#8474 merged
Jul 14, 2024 -
nix: update flake.lock
#8475 merged
Jul 14, 2024 -
llama : fix Gemma-2 Query scaling factors
#8473 merged
Jul 14, 2024 -
gguf_hash.py: Add sha256
#8470 merged
Jul 14, 2024 -
llama : fix pre-tokenization of non-special added tokens
#8228 merged
Jul 14, 2024 -
vulkan : cmake integration
#8119 merged
Jul 13, 2024 -
metal : template-ify some of the kernels
#8447 merged
Jul 13, 2024 -
server : handle content array in chat API
#8449 merged
Jul 12, 2024 -
main : print error on empty input
#8456 merged
Jul 12, 2024 -
llama : suppress unary minus operator warning
#8448 merged
Jul 12, 2024 -
server: Ensure batches are either all embed or all completion (#8076)
#8420 merged
Jul 12, 2024 -
Docker: Fix filename for convert-hf-to-gguf.py in tools.sh
#8441 merged
Jul 12, 2024 -
Removing fsep token from GPTRefactForCausalLM
#8237 merged
Jul 12, 2024 -
examples : sprintf -> snprintf
#8434 merged
Jul 12, 2024 -
ggml : minor naming changes
#8433 merged
Jul 12, 2024 -
[SYCL] fix the mul_mat_id ut issues
#8427 merged
Jul 12, 2024 -
ggml : add NVPL BLAS support (#8329)
#8425 merged
Jul 11, 2024 -
cuda : suppress 'noreturn' warn in no_device_code
#8414 merged
Jul 11, 2024 -
CUDA: optimize and refactor MMQ
#8416 merged
Jul 11, 2024
19 Pull requests opened by 13 people
-
metal : add BF16 support
#8439 opened
Jul 11, 2024 -
chore : Fix vulkan related compiler warnings, add help text, improve CLI options
#8477 opened
Jul 14, 2024 -
ggml: Install all public headers in the ggml build regardless of build settings
#8480 opened
Jul 14, 2024 -
llama : change fallback type IQ4_NL -> Q4_0
#8489 opened
Jul 15, 2024 -
examples : Rewrite pydantic_models_to_grammar_examples.py
#8493 opened
Jul 15, 2024 -
CUDA: MMQ code deduplication + iquant support
#8495 opened
Jul 15, 2024 -
docs: added AI Studio to the list of UIs [no ci]
#8505 opened
Jul 16, 2024 -
llama : refactoring
#8508 opened
Jul 16, 2024 -
llama : simplify Mamba with advanced batch splits
#8526 opened
Jul 17, 2024 -
llama : bump max layers from 256 to 512
#8530 opened
Jul 17, 2024 -
Improvements for running on Windows with Snapdragon X
#8531 opened
Jul 17, 2024 -
README.md: include steps to run cmake [no ci]
#8540 opened
Jul 17, 2024 -
CPU/CUDA: Gemma 2 FlashAttention support
#8542 opened
Jul 17, 2024 -
Add support for Chameleon
#8543 opened
Jul 17, 2024 -
ggml : add SSM Metal kernels
#8546 opened
Jul 17, 2024 -
ggml : fix iq4_nl dot product with odd number of blocks
#8549 opened
Jul 17, 2024 -
[SYCL] fix multi-gpu issue on sycl
#8554 opened
Jul 18, 2024 -
ggml : fix odd blocks for ARM_NEON
#8556 opened
Jul 18, 2024 -
convert-*.py: autogenerate general.uuid if missing
#8565 opened
Jul 18, 2024
56 Issues closed by 15 people
-
Support more AMD GPUs like `gfx90c`
#6110 closed
Jul 18, 2024 -
Feature request: Graphical GGUF viewer
#6715 closed
Jul 18, 2024 -
Server: Multimodal Model Input Parameter No longer Exists
#7112 closed
Jul 18, 2024 -
Bug: Incorrect memory allocation when mixing Nvidia and AMD GPU's
#7674 closed
Jul 18, 2024 -
Bug: Could NOT find BLAS (missing: BLAS_LIBRARIES)
#7708 closed
Jul 18, 2024 -
Bug: In a small n_ctx_slot, the llama.cpp begins gibberish
#8498 closed
Jul 17, 2024 -
Bug: Failed to load model
#8516 closed
Jul 17, 2024 -
Este set
#8523 closed
Jul 17, 2024 -
<?xml version="1.0" encoding="UTF-8"?>
#8524 closed
Jul 17, 2024 -
Segmentation fault on finetune with -ngl > 0, Debian 12 stable
#6994 closed
Jul 17, 2024 -
Feature: support Vulkan devices that don't support 16-bit storage
#7620 closed
Jul 17, 2024 -
Bug: value of keep alive max count in cpp-httplib hardcoded too low
#7694 closed
Jul 17, 2024 -
Feature Request: GGUF 2 BIN
#7695 closed
Jul 17, 2024 -
Feature Request: Prevent server.exe from being detected as ? Trojan:Win32/Wacatac.B!ml
#7704 closed
Jul 17, 2024 -
Bug: GGML_CUDA_FORCE_CUBLAS cannot be compile for hipblas
#8513 closed
Jul 16, 2024 -
Support for H2O Danube3 Family of Models
#8518 closed
Jul 16, 2024 -
lliblama.so is missing
#8512 closed
Jul 16, 2024 -
Bug: Qwen-2-7b-q8 and Qwen-2-7b-instruct-q8 giving weird output when run with CUDA support
#8503 closed
Jul 16, 2024 -
Spaces are not being added after added tokens when `legacy: true` is used
#7094 closed
Jul 16, 2024 -
error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s8_x2’?
#7147 closed
Jul 16, 2024 -
[Bug/Enhancement] Can't disable continuous batching
#8464 closed
Jul 15, 2024 -
Bug: a null-pointer defer in examples/gguf/gguf.cpp/gguf_ex_read_0 and gguf_ex_read_1
#8486 closed
Jul 15, 2024 -
Vulkan backend regression: gibberish output when layers offloaded to GPU
#8092 closed
Jul 15, 2024 -
bad command line parsing behaviour with some filenames
#6163 closed
Jul 15, 2024 -
Bug: pydantic_models_to_grammar_examples.py is broken
#8471 closed
Jul 14, 2024 -
corruption on slot context shift
#6002 closed
Jul 14, 2024 -
Fails to run in SYCL mode
#6528 closed
Jul 14, 2024 -
Question: How to convert Yi-34B-Chat-4bits to gguf?
#7623 closed
Jul 14, 2024 -
Bug: server crashed today for the first time.
#7637 closed
Jul 14, 2024 -
Add Support for Solidity Model
#7653 closed
Jul 14, 2024 -
llama-b3376-bin-win-cuda-cu12.2.0-x64 missing dlls.
#8443 closed
Jul 13, 2024 -
main : failed to eval
#8458 closed
Jul 13, 2024 -
can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft'
#6690 closed
Jul 13, 2024 -
failed to quantize: ios_base::clear: unspecified iostream_category error
#6945 closed
Jul 13, 2024 -
Old GGUF have broken tokenization and there is no warning
#7476 closed
Jul 13, 2024 -
Incremental learning
#7511 closed
Jul 13, 2024 -
finetune error: ggml_flash_attn_ext() not yet supported
#7523 closed
Jul 13, 2024 -
Add support for accelerating with QNN on Windows on ARM
#7541 closed
Jul 13, 2024 -
Bug: Inconsistent ggml-4-x86-cuda-v100 ci failures on master
#7613 closed
Jul 13, 2024 -
Feature Request: Improve Ergonomics of `llama-server`
#7619 closed
Jul 13, 2024 -
Feature Request: codestral support
#7622 closed
Jul 13, 2024 -
Bug: Misplaced docs/token_generation_performance_tips.md or link broken
#8381 closed
Jul 12, 2024 -
Bug: JSON Schema-to-GBNF additionalProperties bugs (and other minor quirks)
#7789 closed
Jul 12, 2024 -
server : support content array in OAI chat API
#8367 closed
Jul 12, 2024 -
Bug: std::out_of_range error for codegeex4-all-9b-GGUF
#8438 closed
Jul 12, 2024 -
Bug: llama-server crashes when started with --embeddings
#8076 closed
Jul 12, 2024 -
Freshly converted PLaMo fails assertion: vocab.id_to_token.size() == vocab.token_to_id.size()
#5669 closed
Jul 12, 2024 -
Binary starting with b2715 doesn't work on Intel Mac anymore
#7110 closed
Jul 12, 2024 -
Feature - writing conversation to a txt file.
#7545 closed
Jul 12, 2024 -
Question: how to make main to lead it work with my M3 E-cores instead of P-cores
#7577 closed
Jul 12, 2024 -
Adding NVPL BLAS support
#8329 closed
Jul 11, 2024
38 Issues opened by 32 people
-
Bug: failing ggml.c:12621: ne2 == ne02 during finetuning
#8564 opened
Jul 18, 2024 -
Feature Request: Add support for new model conversion
#8563 opened
Jul 18, 2024 -
Bug: Vulkan build no longer working with MSVC cmake on windows
#8562 opened
Jul 18, 2024 -
Feature Request: Pull from Ollama repo
#8560 opened
Jul 18, 2024 -
Bug: GLM4 9b produces wrong results with llama-server
#8558 opened
Jul 18, 2024 -
Feature Request: support reranking API endpoint and models
#8555 opened
Jul 18, 2024 -
Bug: After updating the docker image, legacy models began issuing an EOS token at the end of generation
#8545 opened
Jul 17, 2024 -
Bug: python3 convert.py [Errno 2] No such file or directory
#8544 opened
Jul 17, 2024 -
Bug: RPC server doesn't load GPU if I use Vulkan
#8536 opened
Jul 17, 2024 -
Bug: Docker build warnings
#8535 opened
Jul 17, 2024 -
Feature Request: Architecture "LlavaMistralForCausalLM" not supported!
#8533 opened
Jul 17, 2024 -
Feature Request: Add support for Lite-Mistral-Instruct chat template
#8529 opened
Jul 17, 2024 -
Bug: Can't quantize 405B Mega merge
#8528 opened
Jul 17, 2024 -
Feature Request: Support Codestral Mamba
#8519 opened
Jul 16, 2024 -
Newest apple model unsupported...
#8514 opened
Jul 16, 2024 -
Llama.cpp release notes lacking descriptions in the github.com page
#8509 opened
Jul 16, 2024 -
Run Llama.cpp in silent mode
#8507 opened
Jul 16, 2024 -
Bug: ROCm CUDA error
#8504 opened
Jul 16, 2024 -
Bug: Weird output from llama-speculative
#8499 opened
Jul 16, 2024 -
Bug: GGML_HIP_UMA causes consistency errors
#8496 opened
Jul 15, 2024 -
Bug: MESA: error: ../src/intel/vulkan/anv_device.c:4237: VK_ERROR_OUT_OF_DEVICE_MEMORY
#8492 opened
Jul 15, 2024 -
Bug: gemma2 perplexity pending forever
#8490 opened
Jul 15, 2024 -
Bug - Can't build vulkan backend on RISC-V platform anymore
#8488 opened
Jul 15, 2024 -
Feature Request: Hope to support Qwen VL
#8487 opened
Jul 15, 2024 -
Feature Request: T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
#8485 opened
Jul 15, 2024 -
Feature Request: Improve Gemma v2 model performance on Vulkan backend
#8476 opened
Jul 14, 2024 -
llama-cli chat templates ignored?
#8469 opened
Jul 13, 2024 -
glm-4-9b-chat-1m mopdel issue: wrong shape
#8467 opened
Jul 13, 2024 -
tiktoken package missing from requirements
#8466 opened
Jul 13, 2024 -
instruct models don't work with latest llama cppBug:
#8463 opened
Jul 12, 2024 -
Bug: mmproj from LLaVA 1.6 (spatial_unpad) seems to be broken
#8457 opened
Jul 12, 2024 -
Bug: llama.cpp with Vulkan not running on Snapdragon X + Windows (Copilot+PCs)
#8455 opened
Jul 12, 2024 -
Feature Request: Drop dependency on cublas library on build / TinyBLAS support
#8452 opened
Jul 12, 2024 -
Unable to convert a fireworks ai model to GGUF with gguf-my-repo
#8451 opened
Jul 12, 2024 -
Bug: ggml-aarch64.c does not compile on Windows ARM64 with MSVC
#8446 opened
Jul 12, 2024
85 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: Support Moore Threads GPU
#8383 commented on
Jul 16, 2024 • 16 new comments -
Add example script for rendering jinja2 templates
#7246 commented on
Jul 15, 2024 • 10 new comments -
support MiniCPM-V-2.5
#7599 commented on
Jul 18, 2024 • 7 new comments -
Implemented Spellcheck for Llama.cpp
#7884 commented on
Jul 12, 2024 • 3 new comments -
server: Windows 7 compatibility
#8208 commented on
Jul 17, 2024 • 2 new comments -
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on
Jul 15, 2024 • 1 new comment -
convert_hf_to_gguf.py, convert_hf_to_gguf_update.py: Added Ukrainian tokens into string
#8435 commented on
Jul 13, 2024 • 0 new comments -
Bug: [SYCL] Inference not working correctly on multiple GPUs
#8294 commented on
Jul 18, 2024 • 0 new comments -
Recoverable Error Handling
#4385 commented on
Jul 18, 2024 • 0 new comments -
Bug: InternLM 2.5 Chat Tool Calls: Incorrect and Inconsistent Formatting
#8405 commented on
Jul 18, 2024 • 0 new comments -
SIMD Everywhere
#7983 commented on
Jul 18, 2024 • 0 new comments -
[SYCL] Implement Flash attention.
#7141 commented on
Jul 18, 2024 • 0 new comments -
Bug: Random output from llama-cli in chat mode.
#7929 commented on
Jul 18, 2024 • 0 new comments -
Facing issue while converting finetune LLaVA Mistral model to gguf
#7963 commented on
Jul 18, 2024 • 0 new comments -
Bug: Unable to call llama.cpp inference server with llama 3 model
#7978 commented on
Jul 18, 2024 • 0 new comments -
ggml-cuda.so is 90mb with -arch=all
#7156 commented on
Jul 17, 2024 • 0 new comments -
llama : create llamax library
#5215 commented on
Jul 17, 2024 • 0 new comments -
Bug: Error when trying to use `./llama-gguf-split --merge` to merge split model gguf files back
#8264 commented on
Jul 17, 2024 • 0 new comments -
[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors?
#1499 commented on
Jul 17, 2024 • 0 new comments -
Question: Why do GPU and CPU embedding outputs differ for the same input? Is normal?
#7608 commented on
Jul 17, 2024 • 0 new comments -
llama : support Mamba-2
#7727 commented on
Jul 17, 2024 • 0 new comments -
Bug: CUDA error: out of memory - Phi-3 Mini 128k prompted with 20k+ tokens on 4GB GPU
#7885 commented on
Jul 17, 2024 • 0 new comments -
How to properly serve Gemma 7b?
#7952 commented on
Jul 17, 2024 • 0 new comments -
Latest vulkan version doesn't follow instruction
#7965 commented on
Jul 17, 2024 • 0 new comments -
Bug: Phi-3 Tokenizer Adds Whitespaces on re-tokenization (which invalidates KV-cache)
#7938 commented on
Jul 16, 2024 • 0 new comments -
Add multiple derived adaptions hosting
#8415 commented on
Jul 16, 2024 • 0 new comments -
ggml : reading the runtime sve config of the cpu
#8382 commented on
Jul 13, 2024 • 0 new comments -
Tokenizer fixes
#8379 commented on
Jul 13, 2024 • 0 new comments -
server: Update public_simplechat/datautils.mjs
#8362 commented on
Jul 13, 2024 • 0 new comments -
server : avoid breaking KV cache when prompt >= n_ctx (#6855)
#8359 commented on
Jul 13, 2024 • 0 new comments -
Adding models to the list in convert-hf-to-gguf-update.py
#8357 commented on
Jul 13, 2024 • 0 new comments -
fix and speed up compilaton
#8354 commented on
Jul 13, 2024 • 0 new comments -
build example/main.cpp as shared library and intercept token printing using FFI
#8339 commented on
Jul 13, 2024 • 0 new comments -
llama.swiftui: Fix a small bug
#8268 commented on
Jul 15, 2024 • 0 new comments -
clip: don't throw exceptions from llava functions compiled as extern "C"
#8210 commented on
Jul 12, 2024 • 0 new comments -
Embed files
#8121 commented on
Jul 15, 2024 • 0 new comments -
ggml-cuda: Adding support for unified memory
#8035 commented on
Jul 16, 2024 • 0 new comments -
Add Intel Advanced Matrix Extensions (AMX) support to ggml
#7707 commented on
Jul 18, 2024 • 0 new comments -
Introduce Q8_0 and Q4_0 with Bf16 delta values
#7497 commented on
Jul 15, 2024 • 0 new comments -
ggml-qnn: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend
#6869 commented on
Jul 18, 2024 • 0 new comments -
added implementation of DRY sampler
#6839 commented on
Jul 12, 2024 • 0 new comments -
Xeon Phi (Knights Corner) Support.
#6440 commented on
Jul 11, 2024 • 0 new comments -
Feature Request: Support for Meta Chameleon 7B and 34B
#7995 commented on
Jul 18, 2024 • 0 new comments -
Feature Request: Add VideoLLaMA2 support
#7900 commented on
Jul 13, 2024 • 0 new comments -
Bug: Yi 1.5 segmentation fault
#8369 commented on
Jul 12, 2024 • 0 new comments -
Bug: ggml/src/ggml.c: In function 'ggml_vec_mad_f16':
#8378 commented on
Jul 12, 2024 • 0 new comments -
Bug: Vulkan backend not work on an Imagination GPU on RISC-V Platform
#8437 commented on
Jul 12, 2024 • 0 new comments -
ggml : add DirectML backend
#7772 commented on
Jul 12, 2024 • 0 new comments -
train a model from scratch but f16 or q8
#8429 commented on
Jul 12, 2024 • 0 new comments -
Llama only uses dedicated memory when both shared and dedicated are available.
#6743 commented on
Jul 12, 2024 • 0 new comments -
Can't run the program
#7181 commented on
Jul 12, 2024 • 0 new comments -
Feature Request: Add vocabulary type for token-free models that work on raw bytes
#7763 commented on
Jul 12, 2024 • 0 new comments -
Bug: 'scripts/run-with-preset.py` fails on `--tensor-split` option when run on non-GPU-enabled system
#7864 commented on
Jul 12, 2024 • 0 new comments -
Bug: I use llama-b3091-bin-win-llvm-arm64.zip Run qwen2-0_5b-instruct-q8_0.gguf and it cannot start. Is it a compilation error of llama-b3091-bin-win-llvm-arm64.zip?
#7873 commented on
Jul 12, 2024 • 0 new comments -
Bug: Random output after the last update
#7874 commented on
Jul 12, 2024 • 0 new comments -
Feature Request: Add Paligemma support
#7875 commented on
Jul 12, 2024 • 0 new comments -
Bug: multithreading for requests,model infer service failed
#7876 commented on
Jul 12, 2024 • 0 new comments -
Bug: get-wikitext-103.sh seems not working
#7878 commented on
Jul 12, 2024 • 0 new comments -
Support for MatMul free LLMs
#7889 commented on
Jul 12, 2024 • 0 new comments -
Investigate gemma 2 generation quality
#8240 commented on
Jul 11, 2024 • 0 new comments -
Feature Request: Support for Meta: Multi Token Prediction Models
#8297 commented on
Jul 11, 2024 • 0 new comments -
Bug: Missing Port Binding in Docker Run Command
#8419 commented on
Jul 11, 2024 • 0 new comments -
Bug: Fatal signal 11 (SIGSEGV) on Google Pixel 8 (dart)
#7908 commented on
Jul 11, 2024 • 0 new comments -
Bug: rpc-server --mem Doesn't Match backend memory
#8417 commented on
Jul 11, 2024 • 0 new comments -
Bug: Qwen2-72B-Instruct (and finetunes) Q4_K_M, Q5_K_M generates random output with CuBLAS prompt processing
#8025 commented on
Jul 16, 2024 • 0 new comments -
How to evaluate my converted gguf model ? What all benchmark i can run and how to run on my converted model?
#8409 commented on
Jul 16, 2024 • 0 new comments -
Server UI: Code snippets are being mangled by <em> italic emphasize replacement.
#7023 commented on
Jul 16, 2024 • 0 new comments -
Bug: The "server" provided web-ui chat seems to sometimes not properly quote "<" ">" charaters in its HTML output.
#7905 commented on
Jul 16, 2024 • 0 new comments -
Feature Request: Nemotron-4-340B-Instruct Support
#7966 commented on
Jul 15, 2024 • 0 new comments -
Bug: QWEN2 quantization GGML_ASSERT
#7805 commented on
Jul 15, 2024 • 0 new comments -
convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works)
#7339 commented on
Jul 15, 2024 • 0 new comments -
Question: why llama.cpp mobilevlm model(fp16) inference result is different with official pytorch project results, this is normal?
#7614 commented on
Jul 15, 2024 • 0 new comments -
Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template'
#7923 commented on
Jul 15, 2024 • 0 new comments -
examples/server: "New UI" chat becomes slower with each subsequent message
#7944 commented on
Jul 15, 2024 • 0 new comments -
Add support for InternLM 2.5 1M context. Should be as good as command r+
#8285 commented on
Jul 14, 2024 • 0 new comments -
Regressions on IQ3_XXS over time
#5856 commented on
Jul 14, 2024 • 0 new comments -
Feature Request: tokenized history
#7744 commented on
Jul 14, 2024 • 0 new comments -
Any idea of PyramidKV?
#7916 commented on
Jul 14, 2024 • 0 new comments -
Bug: Outdated documentation of train-text-from-scratch
#7917 commented on
Jul 14, 2024 • 0 new comments -
Bug: Error while converting BERT to GGUF: Can not map tensor 'bert.embeddings.LayerNorm.beta'
#7924 commented on
Jul 14, 2024 • 0 new comments -
Why is the single input used incorrect, or no output?
#8276 commented on
Jul 13, 2024 • 0 new comments -
Bug: Phi-2 model tokenizer not recognized
#7667 commented on
Jul 13, 2024 • 0 new comments -
I am running two socket servers, and the CPU usage is at 50%
#7812 commented on
Jul 13, 2024 • 0 new comments -
ci : self-hosted runner issue
#7893 commented on
Jul 13, 2024 • 0 new comments -
Bug: convert-hf-to-gguf.py fails for Gemma models
#7897 commented on
Jul 13, 2024 • 0 new comments