Pulse · ggerganov/llama.cpp · GitHub

July 11, 2024 – July 18, 2024

Overview

59 Active pull requests

94 Active issues

26 Releases published by 1 person

b3373
published Jul 11, 2024
b3374
published Jul 11, 2024
b3375
published Jul 11, 2024
b3376
published Jul 12, 2024
b3378
published Jul 12, 2024
b3381
published Jul 12, 2024
b3382
published Jul 12, 2024
b3383
published Jul 12, 2024
b3384
published Jul 12, 2024
b3385
published Jul 13, 2024
b3386
published Jul 13, 2024
b3387
published Jul 14, 2024
b3389
published Jul 14, 2024
b3392
published Jul 15, 2024
b3393
published Jul 15, 2024
b3394
published Jul 15, 2024
b3396
published Jul 15, 2024
b3398
published Jul 15, 2024
b3400
published Jul 15, 2024
b3402
published Jul 16, 2024
b3403
published Jul 16, 2024
b3405
published Jul 16, 2024
b3406
published Jul 17, 2024
b3407
published Jul 17, 2024
b3408
published Jul 17, 2024
b3412
published Jul 18, 2024

40 Pull requests merged by 22 people

server: use relative routes for static files in new UI
#8552 merged Jul 18, 2024
convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor
#7499 merged Jul 18, 2024
fix special not work for llama-server
#8553 merged Jul 18, 2024
lookup: fibonacci hashing, fix crashes
#8548 merged Jul 17, 2024
build : Fix docker build warnings (#8535)
#8537 merged Jul 17, 2024
Update CONTRIBUTING.md to remove mention of noci
#8541 merged Jul 17, 2024
[CANN] Add Ascend NPU backend
#6035 merged Jul 17, 2024
batched: fix n_predict parameter
#8527 merged Jul 17, 2024
llama : disable context-shift for DeepSeek v2
#8501 merged Jul 17, 2024
make/cmake: add missing force MMQ/cuBLAS for HIP
#8515 merged Jul 16, 2024
Update clib.json to point to Cyan4973 original xxhash repo
#8491 merged Jul 16, 2024
handle export-lora help argument
#8497 merged Jul 16, 2024
llama : valign + remove unused ftype
#8502 merged Jul 16, 2024
convert_hf : faster lazy safetensors
#8482 merged Jul 16, 2024
Refactor lora adapter support
#8332 merged Jul 15, 2024
fix ci
#8494 merged Jul 15, 2024
ggml : suppress unknown pragma 'GCC' on windows
#8460 merged Jul 15, 2024
server: update README.md with llama-server --help's output [no ci]
#8472 merged Jul 15, 2024
docs: fix links in development docs [no ci]
#8481 merged Jul 15, 2024
[SYCL] add concat through dim 1/2
#8483 merged Jul 15, 2024
Vulkan MMQ Fix
#8479 merged Jul 15, 2024
pydantic : replace uses of __annotations__ with get_type_hints
#8474 merged Jul 14, 2024
nix: update flake.lock
#8475 merged Jul 14, 2024
llama : fix Gemma-2 Query scaling factors
#8473 merged Jul 14, 2024
gguf_hash.py: Add sha256
#8470 merged Jul 14, 2024
llama : fix pre-tokenization of non-special added tokens
#8228 merged Jul 14, 2024
vulkan : cmake integration
#8119 merged Jul 13, 2024
metal : template-ify some of the kernels
#8447 merged Jul 13, 2024
server : handle content array in chat API
#8449 merged Jul 12, 2024
main : print error on empty input
#8456 merged Jul 12, 2024
llama : suppress unary minus operator warning
#8448 merged Jul 12, 2024
server: Ensure batches are either all embed or all completion (#8076)
#8420 merged Jul 12, 2024
Docker: Fix filename for convert-hf-to-gguf.py in tools.sh
#8441 merged Jul 12, 2024
Removing fsep token from GPTRefactForCausalLM
#8237 merged Jul 12, 2024
examples : sprintf -> snprintf
#8434 merged Jul 12, 2024
ggml : minor naming changes
#8433 merged Jul 12, 2024
[SYCL] fix the mul_mat_id ut issues
#8427 merged Jul 12, 2024
ggml : add NVPL BLAS support (#8329)
#8425 merged Jul 11, 2024
cuda : suppress 'noreturn' warn in no_device_code
#8414 merged Jul 11, 2024
CUDA: optimize and refactor MMQ
#8416 merged Jul 11, 2024

19 Pull requests opened by 13 people

metal : add BF16 support
#8439 opened Jul 11, 2024
chore : Fix vulkan related compiler warnings, add help text, improve CLI options
#8477 opened Jul 14, 2024
ggml: Install all public headers in the ggml build regardless of build settings
#8480 opened Jul 14, 2024
llama : change fallback type IQ4_NL -> Q4_0
#8489 opened Jul 15, 2024
examples : Rewrite pydantic_models_to_grammar_examples.py
#8493 opened Jul 15, 2024
CUDA: MMQ code deduplication + iquant support
#8495 opened Jul 15, 2024
docs: added AI Studio to the list of UIs [no ci]
#8505 opened Jul 16, 2024
llama : refactoring
#8508 opened Jul 16, 2024
llama : simplify Mamba with advanced batch splits
#8526 opened Jul 17, 2024
llama : bump max layers from 256 to 512
#8530 opened Jul 17, 2024
Improvements for running on Windows with Snapdragon X
#8531 opened Jul 17, 2024
README.md: include steps to run cmake [no ci]
#8540 opened Jul 17, 2024
CPU/CUDA: Gemma 2 FlashAttention support
#8542 opened Jul 17, 2024
Add support for Chameleon
#8543 opened Jul 17, 2024
ggml : add SSM Metal kernels
#8546 opened Jul 17, 2024
ggml : fix iq4_nl dot product with odd number of blocks
#8549 opened Jul 17, 2024
[SYCL] fix multi-gpu issue on sycl
#8554 opened Jul 18, 2024
ggml : fix odd blocks for ARM_NEON
#8556 opened Jul 18, 2024
convert-*.py: autogenerate general.uuid if missing
#8565 opened Jul 18, 2024

56 Issues closed by 15 people

Support more AMD GPUs like `gfx90c`
#6110 closed Jul 18, 2024
Feature request: Graphical GGUF viewer
#6715 closed Jul 18, 2024
Server: Multimodal Model Input Parameter No longer Exists
#7112 closed Jul 18, 2024
Bug: Incorrect memory allocation when mixing Nvidia and AMD GPU's
#7674 closed Jul 18, 2024
Bug: Could NOT find BLAS (missing: BLAS_LIBRARIES)
#7708 closed Jul 18, 2024
Bug: Build llama.cpp with DGGML_VULKAN=ON on Ubuntu 20.04 of arch64, the result of the program is not correct.
#8365 closed Jul 17, 2024
Bug: In a small n_ctx_slot, the llama.cpp begins gibberish
#8498 closed Jul 17, 2024
Bug: Failed to load model
#8516 closed Jul 17, 2024
Este set
#8523 closed Jul 17, 2024
<?xml version="1.0" encoding="UTF-8"?>
#8524 closed Jul 17, 2024
Segmentation fault on finetune with -ngl > 0, Debian 12 stable
#6994 closed Jul 17, 2024
Feature: support Vulkan devices that don't support 16-bit storage
#7620 closed Jul 17, 2024
Bug: value of keep alive max count in cpp-httplib hardcoded too low
#7694 closed Jul 17, 2024
Feature Request: GGUF 2 BIN
#7695 closed Jul 17, 2024
Feature Request: Prevent server.exe from being detected as ? Trojan:Win32/Wacatac.B!ml
#7704 closed Jul 17, 2024
Bug: GGML_CUDA_FORCE_CUBLAS cannot be compile for hipblas
#8513 closed Jul 16, 2024
Support for H2O Danube3 Family of Models
#8518 closed Jul 16, 2024
lliblama.so is missing
#8512 closed Jul 16, 2024
Bug: Qwen-2-7b-q8 and Qwen-2-7b-instruct-q8 giving weird output when run with CUDA support
#8503 closed Jul 16, 2024
Spaces are not being added after added tokens when `legacy: true` is used
#7094 closed Jul 16, 2024
error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s8_x2’?
#7147 closed Jul 16, 2024
[Bug/Enhancement] Can't disable continuous batching
#8464 closed Jul 15, 2024
Bug: a null-pointer defer in examples/gguf/gguf.cpp/gguf_ex_read_0 and gguf_ex_read_1
#8486 closed Jul 15, 2024
Vulkan backend regression: gibberish output when layers offloaded to GPU
#8092 closed Jul 15, 2024
bad command line parsing behaviour with some filenames
#6163 closed Jul 15, 2024
Bug: pydantic_models_to_grammar_examples.py is broken
#8471 closed Jul 14, 2024
Bug: GGML_ASSERT: ggml.c:12793: ne2 == ne02 zsh: abort ./finetune --model-base --train-data ./Llama3-8B-Chinese-Chat-fintune/111.tx
#7877 closed Jul 14, 2024
corruption on slot context shift
#6002 closed Jul 14, 2024
Fails to run in SYCL mode
#6528 closed Jul 14, 2024
Question: How to convert Yi-34B-Chat-4bits to gguf?
#7623 closed Jul 14, 2024
Bug: server crashed today for the first time.
#7637 closed Jul 14, 2024
Add Support for Solidity Model
#7653 closed Jul 14, 2024
llama-b3376-bin-win-cuda-cu12.2.0-x64 missing dlls.
#8443 closed Jul 13, 2024
main : failed to eval
#8458 closed Jul 13, 2024
can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft'
#6690 closed Jul 13, 2024
failed to quantize: ios_base::clear: unspecified iostream_category error
#6945 closed Jul 13, 2024
Old GGUF have broken tokenization and there is no warning
#7476 closed Jul 13, 2024
Incremental learning
#7511 closed Jul 13, 2024
finetune error: ggml_flash_attn_ext() not yet supported
#7523 closed Jul 13, 2024
Add support for accelerating with QNN on Windows on ARM
#7541 closed Jul 13, 2024
Bug: When running on an Android phone, processing an image took 4-5 minutes, which is very slow. Are there any optimization methods? (我在Android 手机上运行，一张图像跑了4、5分钟，非常慢，有什么优化方法吗) #7579
#7579 closed Jul 13, 2024
Question: When using finetune LoRA to fine-tune the LLaMA3-7B-4bit GGUF model, why does the training prematurely end and save the LoRA model?
#7611 closed Jul 13, 2024
Bug: Inconsistent ggml-4-x86-cuda-v100 ci failures on master
#7613 closed Jul 13, 2024
Feature Request: Improve Ergonomics of `llama-server`
#7619 closed Jul 13, 2024
Feature Request: codestral support
#7622 closed Jul 13, 2024
Bug: Misplaced docs/token_generation_performance_tips.md or link broken
#8381 closed Jul 12, 2024
Bug: JSON Schema-to-GBNF additionalProperties bugs (and other minor quirks)
#7789 closed Jul 12, 2024
server : support content array in OAI chat API
#8367 closed Jul 12, 2024
Bug: std::out_of_range error for codegeex4-all-9b-GGUF
#8438 closed Jul 12, 2024
Bug: llama-server crashes when started with --embeddings
#8076 closed Jul 12, 2024
Freshly converted PLaMo fails assertion: vocab.id_to_token.size() == vocab.token_to_id.size()
#5669 closed Jul 12, 2024
Binary starting with b2715 doesn't work on Intel Mac anymore
#7110 closed Jul 12, 2024
Feature - writing conversation to a txt file.
#7545 closed Jul 12, 2024
Feature Request: --multiuser they will be executed in sequence. Koboldcpp does not allow parallel decoding.Multi task parallel processing, hoping next update
#7558 closed Jul 12, 2024
Question: how to make main to lead it work with my M3 E-cores instead of P-cores
#7577 closed Jul 12, 2024
Adding NVPL BLAS support
#8329 closed Jul 11, 2024

38 Issues opened by 32 people

Bug: failing ggml.c:12621: ne2 == ne02 during finetuning
#8564 opened Jul 18, 2024
Feature Request: Add support for new model conversion
#8563 opened Jul 18, 2024
Bug: Vulkan build no longer working with MSVC cmake on windows
#8562 opened Jul 18, 2024
Feature Request: Pull from Ollama repo
#8560 opened Jul 18, 2024
Bug: GLM4 9b produces wrong results with llama-server
#8558 opened Jul 18, 2024
Feature Request: support reranking API endpoint and models
#8555 opened Jul 18, 2024
Bug: InvalidModule: Invalid SPIR-V module: input SPIR-V module uses extension 'SPV_INTEL_memory_access_aliasing' which were disabled by --spirv-ext option
#8551 opened Jul 18, 2024
Bug: After updating the docker image, legacy models began issuing an EOS token at the end of generation
#8545 opened Jul 17, 2024
Bug: python3 convert.py [Errno 2] No such file or directory
#8544 opened Jul 17, 2024
Bug: RPC server doesn't load GPU if I use Vulkan
#8536 opened Jul 17, 2024
Bug: Docker build warnings
#8535 opened Jul 17, 2024
Feature Request: Architecture "LlavaMistralForCausalLM" not supported!
#8533 opened Jul 17, 2024
Feature Request: Add support for Lite-Mistral-Instruct chat template
#8529 opened Jul 17, 2024
Bug: Can't quantize 405B Mega merge
#8528 opened Jul 17, 2024
Feature Request: Support Codestral Mamba
#8519 opened Jul 16, 2024
Newest apple model unsupported...
#8514 opened Jul 16, 2024
Llama.cpp release notes lacking descriptions in the github.com page
#8509 opened Jul 16, 2024
Run Llama.cpp in silent mode
#8507 opened Jul 16, 2024
Bug: ROCm CUDA error
#8504 opened Jul 16, 2024
Bug: Weird output from llama-speculative
#8499 opened Jul 16, 2024
Bug: GGML_HIP_UMA causes consistency errors
#8496 opened Jul 15, 2024
Bug: MESA: error: ../src/intel/vulkan/anv_device.c:4237: VK_ERROR_OUT_OF_DEVICE_MEMORY
#8492 opened Jul 15, 2024
Bug: gemma2 perplexity pending forever
#8490 opened Jul 15, 2024
Bug - Can't build vulkan backend on RISC-V platform anymore
#8488 opened Jul 15, 2024
Feature Request: Hope to support Qwen VL
#8487 opened Jul 15, 2024
Feature Request: T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
#8485 opened Jul 15, 2024
Bug: RuntimeError: Internal: could not parse ModelProto from ../llama3/Meta-Llama-3-8B-Instruct/tokenizer.model
#8484 opened Jul 15, 2024
Feature Request: Improve Gemma v2 model performance on Vulkan backend
#8476 opened Jul 14, 2024
llama-cli chat templates ignored?
#8469 opened Jul 13, 2024
glm-4-9b-chat-1m mopdel issue: wrong shape
#8467 opened Jul 13, 2024
tiktoken package missing from requirements
#8466 opened Jul 13, 2024
instruct models don't work with latest llama cppBug:
#8463 opened Jul 12, 2024
Bug: mmproj from LLaVA 1.6 (spatial_unpad) seems to be broken
#8457 opened Jul 12, 2024
Bug: llama.cpp with Vulkan not running on Snapdragon X + Windows (Copilot+PCs)
#8455 opened Jul 12, 2024
Feature Request: Drop dependency on cublas library on build / TinyBLAS support
#8452 opened Jul 12, 2024
Unable to convert a fireworks ai model to GGUF with gguf-my-repo
#8451 opened Jul 12, 2024
Bug: ggml-aarch64.c does not compile on Windows ARM64 with MSVC
#8446 opened Jul 12, 2024
Bug: After converting the InternLM2 7b from LLamaFactory and importing it into ollama, i get an error: tensor 'token_embd.weight' has wrong shape.
#8445 opened Jul 12, 2024

85 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

feat: Support Moore Threads GPU
#8383 commented on Jul 16, 2024 • 16 new comments
Add example script for rendering jinja2 templates
#7246 commented on Jul 15, 2024 • 10 new comments
support MiniCPM-V-2.5
#7599 commented on Jul 18, 2024 • 7 new comments
Implemented Spellcheck for Llama.cpp
#7884 commented on Jul 12, 2024 • 3 new comments
server: Windows 7 compatibility
#8208 commented on Jul 17, 2024 • 2 new comments
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on Jul 15, 2024 • 1 new comment
convert_hf_to_gguf.py, convert_hf_to_gguf_update.py: Added Ukrainian tokens into string
#8435 commented on Jul 13, 2024 • 0 new comments
Bug: [SYCL] Inference not working correctly on multiple GPUs
#8294 commented on Jul 18, 2024 • 0 new comments
Recoverable Error Handling
#4385 commented on Jul 18, 2024 • 0 new comments
Bug: InternLM 2.5 Chat Tool Calls: Incorrect and Inconsistent Formatting
#8405 commented on Jul 18, 2024 • 0 new comments
SIMD Everywhere
#7983 commented on Jul 18, 2024 • 0 new comments
[SYCL] Implement Flash attention.
#7141 commented on Jul 18, 2024 • 0 new comments
Bug: Random output from llama-cli in chat mode.
#7929 commented on Jul 18, 2024 • 0 new comments
Facing issue while converting finetune LLaVA Mistral model to gguf
#7963 commented on Jul 18, 2024 • 0 new comments
Bug: Unable to call llama.cpp inference server with llama 3 model
#7978 commented on Jul 18, 2024 • 0 new comments
ggml-cuda.so is 90mb with -arch=all
#7156 commented on Jul 17, 2024 • 0 new comments
llama : create llamax library
#5215 commented on Jul 17, 2024 • 0 new comments
Bug: Error when trying to use `./llama-gguf-split --merge` to merge split model gguf files back
#8264 commented on Jul 17, 2024 • 0 new comments
[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors?
#1499 commented on Jul 17, 2024 • 0 new comments
Question: Why do GPU and CPU embedding outputs differ for the same input? Is normal?
#7608 commented on Jul 17, 2024 • 0 new comments
llama : support Mamba-2
#7727 commented on Jul 17, 2024 • 0 new comments
Bug: CUDA error: out of memory - Phi-3 Mini 128k prompted with 20k+ tokens on 4GB GPU
#7885 commented on Jul 17, 2024 • 0 new comments
How to properly serve Gemma 7b?
#7952 commented on Jul 17, 2024 • 0 new comments
Latest vulkan version doesn't follow instruction
#7965 commented on Jul 17, 2024 • 0 new comments
Bug: Phi-3 Tokenizer Adds Whitespaces on re-tokenization (which invalidates KV-cache)
#7938 commented on Jul 16, 2024 • 0 new comments
Add multiple derived adaptions hosting
#8415 commented on Jul 16, 2024 • 0 new comments
ggml : reading the runtime sve config of the cpu
#8382 commented on Jul 13, 2024 • 0 new comments
Tokenizer fixes
#8379 commented on Jul 13, 2024 • 0 new comments
server: Update public_simplechat/datautils.mjs
#8362 commented on Jul 13, 2024 • 0 new comments
server : avoid breaking KV cache when prompt >= n_ctx (#6855)
#8359 commented on Jul 13, 2024 • 0 new comments
Adding models to the list in convert-hf-to-gguf-update.py
#8357 commented on Jul 13, 2024 • 0 new comments
fix and speed up compilaton
#8354 commented on Jul 13, 2024 • 0 new comments
build example/main.cpp as shared library and intercept token printing using FFI
#8339 commented on Jul 13, 2024 • 0 new comments
llama.swiftui: Fix a small bug
#8268 commented on Jul 15, 2024 • 0 new comments
clip: don't throw exceptions from llava functions compiled as extern "C"
#8210 commented on Jul 12, 2024 • 0 new comments
Embed files
#8121 commented on Jul 15, 2024 • 0 new comments
ggml-cuda: Adding support for unified memory
#8035 commented on Jul 16, 2024 • 0 new comments
Add Intel Advanced Matrix Extensions (AMX) support to ggml
#7707 commented on Jul 18, 2024 • 0 new comments
Introduce Q8_0 and Q4_0 with Bf16 delta values
#7497 commented on Jul 15, 2024 • 0 new comments
ggml-qnn: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend
#6869 commented on Jul 18, 2024 • 0 new comments
added implementation of DRY sampler
#6839 commented on Jul 12, 2024 • 0 new comments
Xeon Phi (Knights Corner) Support.
#6440 commented on Jul 11, 2024 • 0 new comments
Feature Request: Support for Meta Chameleon 7B and 34B
#7995 commented on Jul 18, 2024 • 0 new comments
Feature Request: Add VideoLLaMA2 support
#7900 commented on Jul 13, 2024 • 0 new comments
Bug: Yi 1.5 segmentation fault
#8369 commented on Jul 12, 2024 • 0 new comments
Bug: ggml/src/ggml.c: In function 'ggml_vec_mad_f16':
#8378 commented on Jul 12, 2024 • 0 new comments
Bug: Vulkan backend not work on an Imagination GPU on RISC-V Platform
#8437 commented on Jul 12, 2024 • 0 new comments
ggml : add DirectML backend
#7772 commented on Jul 12, 2024 • 0 new comments
train a model from scratch but f16 or q8
#8429 commented on Jul 12, 2024 • 0 new comments
Llama only uses dedicated memory when both shared and dedicated are available.
#6743 commented on Jul 12, 2024 • 0 new comments
Can't run the program
#7181 commented on Jul 12, 2024 • 0 new comments
Feature Request: Add vocabulary type for token-free models that work on raw bytes
#7763 commented on Jul 12, 2024 • 0 new comments
Bug: 'scripts/run-with-preset.py` fails on `--tensor-split` option when run on non-GPU-enabled system
#7864 commented on Jul 12, 2024 • 0 new comments
Bug: I use llama-b3091-bin-win-llvm-arm64.zip Run qwen2-0_5b-instruct-q8_0.gguf and it cannot start. Is it a compilation error of llama-b3091-bin-win-llvm-arm64.zip?
#7873 commented on Jul 12, 2024 • 0 new comments
Bug: Random output after the last update
#7874 commented on Jul 12, 2024 • 0 new comments
Feature Request: Add Paligemma support
#7875 commented on Jul 12, 2024 • 0 new comments
Bug: multithreading for requests，model infer service failed
#7876 commented on Jul 12, 2024 • 0 new comments
Bug: get-wikitext-103.sh seems not working
#7878 commented on Jul 12, 2024 • 0 new comments
Support for MatMul free LLMs
#7889 commented on Jul 12, 2024 • 0 new comments
Investigate gemma 2 generation quality
#8240 commented on Jul 11, 2024 • 0 new comments
Feature Request: Support for Meta: Multi Token Prediction Models
#8297 commented on Jul 11, 2024 • 0 new comments
Bug: Missing Port Binding in Docker Run Command
#8419 commented on Jul 11, 2024 • 0 new comments
Bug: Fatal signal 11 (SIGSEGV) on Google Pixel 8 (dart)
#7908 commented on Jul 11, 2024 • 0 new comments
Bug: rpc-server --mem Doesn't Match backend memory
#8417 commented on Jul 11, 2024 • 0 new comments
Bug: Qwen2-72B-Instruct (and finetunes) Q4_K_M, Q5_K_M generates random output with CuBLAS prompt processing
#8025 commented on Jul 16, 2024 • 0 new comments
How to evaluate my converted gguf model ? What all benchmark i can run and how to run on my converted model?
#8409 commented on Jul 16, 2024 • 0 new comments
Server UI: Code snippets are being mangled by <em> italic emphasize replacement.
#7023 commented on Jul 16, 2024 • 0 new comments
Bug: The "server" provided web-ui chat seems to sometimes not properly quote "<" ">" charaters in its HTML output.
#7905 commented on Jul 16, 2024 • 0 new comments
Feature Request: Nemotron-4-340B-Instruct Support
#7966 commented on Jul 15, 2024 • 0 new comments
Bug: QWEN2 quantization GGML_ASSERT
#7805 commented on Jul 15, 2024 • 0 new comments
convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works)
#7339 commented on Jul 15, 2024 • 0 new comments
Question: why llama.cpp mobilevlm model(fp16) inference result is different with official pytorch project results, this is normal?
#7614 commented on Jul 15, 2024 • 0 new comments
Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template'
#7923 commented on Jul 15, 2024 • 0 new comments
examples/server: "New UI" chat becomes slower with each subsequent message
#7944 commented on Jul 15, 2024 • 0 new comments
Add support for InternLM 2.5 1M context. Should be as good as command r+
#8285 commented on Jul 14, 2024 • 0 new comments
Regressions on IQ3_XXS over time
#5856 commented on Jul 14, 2024 • 0 new comments
Feature Request: tokenized history
#7744 commented on Jul 14, 2024 • 0 new comments
Any idea of PyramidKV?
#7916 commented on Jul 14, 2024 • 0 new comments
Bug: Outdated documentation of train-text-from-scratch
#7917 commented on Jul 14, 2024 • 0 new comments
Bug: Error while converting BERT to GGUF: Can not map tensor 'bert.embeddings.LayerNorm.beta'
#7924 commented on Jul 14, 2024 • 0 new comments
Why is the single input used incorrect, or no output?
#8276 commented on Jul 13, 2024 • 0 new comments
Bug: Phi-2 model tokenizer not recognized
#7667 commented on Jul 13, 2024 • 0 new comments
I am running two socket servers, and the CPU usage is at 50%
#7812 commented on Jul 13, 2024 • 0 new comments
ci : self-hosted runner issue
#7893 commented on Jul 13, 2024 • 0 new comments
Bug: convert-hf-to-gguf.py fails for Gemma models
#7897 commented on Jul 13, 2024 • 0 new comments