Skip to content

Commit

Permalink
Support flash attention in --server mode
Browse files Browse the repository at this point in the history
  • Loading branch information
jart committed Jun 29, 2024
1 parent 7692b85 commit 4aea606
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions llama.cpp/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2278,6 +2278,10 @@ static void server_params_parse(int argc, char **argv, server_params &sparams,
sparams.read_timeout = std::stoi(argv[i]);
sparams.write_timeout = std::stoi(argv[i]);
}
else if (arg == "-fa" || arg == "--flash-attn") {
params.flash_attn = true;
FLAG_flash_attn = true;
}
else if (arg == "-m" || arg == "--model")
{
if (++i >= argc)
Expand Down

0 comments on commit 4aea606

Please sign in to comment.