Skip to content

Commit

Permalink
Prevent vector overflow in llama.cpp
Browse files Browse the repository at this point in the history
This is an issue that only impacts the new server code, when running the
llama_model destructor.
  • Loading branch information
jart committed Jun 30, 2024
1 parent 571b4e5 commit cd73243
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion llama.cpp/llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4616,7 +4616,8 @@ static bool llm_load_tensors(
model.n_gpu_layers = n_gpu_layers;

const int64_t n_layer = hparams.n_layer;
const int64_t i_gpu_start = std::max((int64_t) hparams.n_layer - n_gpu_layers, (int64_t) 0);
const int n_gpu = std::min(n_gpu_layers, int(n_layer)); // [jart] prevent vector overflow
const int64_t i_gpu_start = std::max((int64_t) hparams.n_layer - n_gpu, (int64_t) 0);
bool use_mmap_buffer = true;

// there is very little benefit to offloading the input layer, so always keep it on the CPU
Expand Down

0 comments on commit cd73243

Please sign in to comment.