Skip to content
View HandH1998's full-sized avatar
  • Beijing
  • 04:19 (UTC +08:00)
Block or Report

Block or report HandH1998

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. vllm-project/vllm vllm-project/vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 23.2k 3.3k

  2. bytedance/lightseq bytedance/lightseq Public

    LightSeq: A High Performance Library for Sequence Processing and Generation

    C++ 3.1k 323

  3. microsoft/Megatron-DeepSpeed microsoft/Megatron-DeepSpeed Public

    Forked from NVIDIA/Megatron-LM

    Ongoing research training transformer language models at scale, including: BERT & GPT-2

    Python 1.8k 333

  4. AniZpZ/AutoSmoothQuant AniZpZ/AutoSmoothQuant Public

    An easy-to-use package for implementing SmoothQuant for LLMs

    Python 67 4

  5. QQQ QQQ Public

    QQQ is an innovative and hardware-optimized W4A8 quantization solution.

    Python 31 2

  6. IST-DASLab/marlin IST-DASLab/marlin Public

    FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

    Python 462 34