Hopper
Jul 02, 2024
Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and TensorRT-LLM
As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to...
9 MIN READ
Jul 01, 2024
How Cutting-Edge Computer Chips are Speeding Up the AI Revolution
Featured in Nature, this post delves into how GPUs and other advanced technologies are meeting the computational challenges posed by AI.
1 MIN READ
Jun 12, 2024
Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates
The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...
7 MIN READ
Jun 12, 2024
NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0
Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and...
11 MIN READ
Apr 25, 2024
Announcing Confidential Computing General Access on NVIDIA H100 Tensor Core GPUs
NVIDIA launched the initial release of the Confidential Computing (CC) solution in private preview for early access in July 2023 through NVIDIA LaunchPad....
3 MIN READ
Mar 27, 2024
NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records
Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI...
11 MIN READ
Mar 25, 2024
Building High-Performance Applications in the Era of Accelerated Computing
AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...
6 MIN READ
Mar 06, 2024
How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism
Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in...
10 MIN READ
Dec 18, 2023
Deploying Retrieval-Augmented Generation Applications on NVIDIA GH200 Delivers Accelerated Performance
Large language model (LLM) applications are essential in enhancing productivity across industries through natural language. However, their effectiveness is...
10 MIN READ
Dec 14, 2023
Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM
Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released...
4 MIN READ
Dec 04, 2023
NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200
Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute...
5 MIN READ
Dec 04, 2023
New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility
The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance....
9 MIN READ
Nov 28, 2023
One Giant Superchip for LLMs, Recommenders, and GNNs: Introducing NVIDIA GH200 NVL32
At AWS re:Invent 2023, AWS and NVIDIA announced that AWS will be the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips interconnected with...
9 MIN READ
Nov 16, 2023
Unlock the Power of NVIDIA Grace and NVIDIA Hopper Architectures with Foundational HPC Software
High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...
7 MIN READ
Nov 13, 2023
Simplifying GPU Programming for HPC with NVIDIA Grace Hopper Superchip
The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most...
17 MIN READ
Nov 08, 2023
Setting New Records at Data Center Scale Using NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand
Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI’s GPT...
19 MIN READ