Improving Video Quality with the NVIDIA Video Codec SDK 12.2 for HEVC

NVIDIA Video Codec SDK provides a comprehensive set of APIs for hardware-accelerated video encode and decode on Windows and Linux. The 12.2 release improves video quality for high-efficiency video coding (HEVC). It offers a significant reduction in bit rates, particularly for natural video content. This post details the following new features:

Lookahead level increases lookahead analysis to improve quality (HEVC only).
Temporal filtering helps filter out noise to improve compression efficiency (HEVC only).
Higher bit-depth encoding encodes 8-bit content as 10-bit (HEVC and AV1).
Unidirectional B-frames (instead of P-frames) improve quality, which is especially useful in latency-sensitive use cases (HEVC only).
UHQ (ultra high quality) tuning info provides the best quality in latency-tolerant use cases (HEVC only).

Lookahead level

The lookahead level can help analyze future frames and enable efficient allocation of bits to different frames to optimize coding efficiency. It uses coding tree units (CTUs), along with other encoding statistics for rate control improvements. This can be useful for latency-tolerant encoding. Video Codec SDK 12.2 provides up to four different lookahead levels with different performance and quality tradeoffs.

Use the following settings:

enableLookahead turns on lookahead.
LookaheadDepth determines the number of frames to be buffered in the lookahead queue for analysis.
LookaheadLevel determines the level of analysis to be done to provide quality improvements. A higher lookahead level gives better quality at reduced performance and increased video memory usage compared to a lower lookahead level.

Temporal filtering

Natural video content captured using the camera has noise that may come from a range of factors, including sensor noise. The noise can reduce ‌temporal redundancy and increase the number of bits taken to encode, reducing compression efficiency.

Temporal filtering helps reduce this noise by using patches from neighboring frames to filter the current frame. The temporal filter uses motion estimation to find a patch in the current frame in adjacent frames and then uses those patches to filter the current frame. The filtering is done using a CUDA kernel, while the motion estimation is performed using the NVIDIA encoder (NVENC) hardware.

Temporal filtering can be enabled by setting tfLevel. Temporal filtering can provide average coding gains of 4-5% for natural video content. This feature may not provide significant gains for synthetic or screen content.

Diagram shows the past and future frames that get used for filtering the current frame. — *Figure 1. Frame use for temporal filtering*

We recommend using lookahead level and temporal filtering together. These two features complement each other, resulting in overall BD-BR savings with both features greater than the sum of BD-BR savings offered by the individual features. However, there’s no restriction in using these features individually, and each feature can be programmed separately.

High bit-depth encoding

Encoding 8-bit content as 10-bit content can provide coding gains. Video Codec SDK 12.2 provides options to set the inputBitDepth = 8 and outputBitDepth = 10 to enable high bit-depth encoding. Encoding 8-bit content as 10-bit improves correlation, which results in better compression.

The conversion from 8-bit to 10-bit happens within the NVENC driver using CUDA for HEVC and HW for AV1. It has negligible performance penalties. You can expect coding efficiency gains of 3-4% by enabling high bit-depth encoding and higher for specific sequences.

Diagram shows the video pipeline on both the encode and decode sides when using the high bit-depth encoding. The 8-bit to 10-bit conversion on the encode side happens within the driver. — *Figure 2. Pipeline for using high bit-depth encoding*

Client applications can use this feature when both the encoders and decoders are in their control. If the decoder supports 10-bit decoding, then we strongly recommend using this feature, as this improves coding efficiency.

However, the feature would result in an increase in GPU memory utilization, as the internal buffers are allocated in 10-bit format.

Unidirectional B-frames

Unidirectional B-frames are special B-frames that use predictions only from past frames and avoid the latency problems associated with conventional B-frames. This makes them useful for low-latency encoding. The conventional B-frames predict from the L0 and L1 directions, while unidirectional B-frames predict only from the L0 direction.

Unidirectional B-frames predict from the two previous frames, providing 3-4% BD-BR savings over P-frames, which predict only from P-frames. Enable unidirectional B-frames by setting enableUniDirectionalB = 1. Coding efficiency improves by an average of 3-4%, with gains reaching up to 15% for specific sequences.

Figure 3 shows different pictures used for reference list creation for a sequence having only I/P frames. The reference list L0 uses pictures only from the past.

Figure 4 shows the use of different pictures for reference lists L0 and L1 for a normal B-frame. The L0 list uses pictures from the past, and the L1 list uses pictures from the future.

Figure 5 shows the use of different pictures for reference lists L0 and L1 for unidirectional B-frames. Both the L0 and L1 reference lists are created from previous pictures only.

UHQ tuning info

UHQ tuning info combines lookahead level and temporal filtering to provide the best quality and performance tradeoffs for various presets for the highest quality latency-tolerant encoding. UHQ tuning info automatically sets optimal settings for lookahead and temporal filtering, enabling client applications to use UHQ tuning info instead of tuning both features individually.

This feature also fixes the number of B-frames to five, while using the middle B-frames as a reference. UHQ also disables adaptive I and adaptive B-frames and uses a fixed GOP (group of pictures) structure.

Figure 6 shows the bit-rate savings and performance for HQ and UHQ tuning info for the P1, P4, and P7 presets, along with high bit-depth encoding. This is compared to x265 medium as the reference point. The p4 and p7 presets on UHQ tuning info beat x265 Slow in terms of bit-rate savings. The p1 preset on UHQ tuning info is only 3% higher than x265 Slow. UHQ P1 provides 4x FPS over x265 while UHQ p4 can provide up to 3x FPS over x265 Slow.

Chart showing the bit rate savings and performance for HQ and UHQ Tuning Info and compares this against x265 medium as the reference point. — *Figure 6. Bit-rate savings and performance for HQ and UHQ tuning info*

Get started with Video Codec SDK 12.2

Video Codec SDK 12.2 improves video encoding quality for HEVC encoding with a significant reduction in bit rates, especially for natural video content. New features include lookahead level, temporal filtering, higher bit-depth encoding, unidirectional B-frames, and UHQ tuning info.

Download Video Codec SDK 12.2 to get started. And join the conversation in the NVIDIA Developer forums. To learn more about the new video capabilities, see Improving Video Quality and Performance with AV1 and NVIDIA Ada Lovelace Architecture.