Softvelum news: Nimble Streamer, Larix Broadcaster and more: NVidia NVENC settings for H.265 in Live Transcoder

NVidia® Products with the Kepler, Maxwell and Pascal generation GPUs contain a dedicated accelerator for video encoding, called NVENC, on the GPU die.

Nimble Streamer Live Transcoder has full support for NVidia video encoding and decoding hardware acceleration. Having the hardware capable of the processing and drivers properly installed, our customer can choose NVENC to handle streams' encoding.

HEVC (H.265) codec is supported by multiple NVidia GPUs so now it's also supported in our Live Transcoder. You can use Transcoder to decode and encode H.265 on your hardware under both Linux and Windows.

You can take a look at the list of NVidia GPUs capable of hardware encoding acceleration. To make HW acceleration work, you need to install the graphic card drivers into the system. Use this link to download and install them. If you haven't yet installed Nimble Streamer transcoder, use this page to find proper setup instruction.

Notice that NVENC has a couple of known issues, please check Troubleshooting section below in this article.

Setting encoder

The transcoding scenarios are created using our web UI. You can check this YouTube playlist to see how various use cases are defined. Takes just a couple of minutes to complete.

Scenarios setup page

Example of HEVC encoding scenario

To set up NVENC settings you need to open encoder settings dialog and choose "nvenc" as the Encoder and select h265 as Codec.

After that you can add various parameters and set up specific values to tune up your encoding process. Please find full list of available encoding parameters below.

preset

Specifies H.265 preset.

hp - high performance
default - tradeoff between performance and quality
hq - high quality
llhp - low latency high performance
ll - default low latency preset and the quality and speed is midway of the two other presets
llhq - low latency high quality
lossless - default lossless preset
losslesshp - lossless high performance
bd - BlueRay disk quality

profile

Specifies H.265 profile.

main
auto

level

Specifies the level of the encoded bitstream.

auto
1
2
21
3
31
4
41
5
51
52
6
61
62

gpu

Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on.

If you set it to "auto" then transcoder will choose the least busy GPU.

keyint

Number of pictures within the current GOP (Group of Pictures).

0 - NVENC_INFINITE_GOPLENGTH
1 - only I-frames are used

bframes

Specifies maximum number of B frames between non-B-frames.
0 - no B-frames
1 - IBP
2 - IBBP

refs

Specifies the DPB size used for encoding.

Setting it to 0 will let driver use the default dpb size. The low latency application which wants to invalidate reference frame as an error resilience tool is recommended to use a large DPB size so that the encoder can keep old reference frames which can be used if recent frames are invalidated.

fps_n, fps_d

Set output FPS numerator and denominator. It only affects num_units_in_tick and time_scale fields in SPS.
If fps_n=30 and fps_d=1 then it's 30 FPS
If fps_n=60000 and fps_d=2002 then it's 29.97 FPS
Source stream FPS or filter FPS is used if fps_n and fps_d is are set.

rate_control

Sets bitrate type.

cqp - Constant QP mode
vbr - Variable bitrate mode
cbr - Constant bitrate mode
vbr_minqp - ariable bitrate mode with MinQP
ll_2pass_quality - Multi pass encoding optimized for image quality and works only with low latency mode
ll_2pass_size - Multi pass encoding optimized for maintaining frame size and works only with low latency mode
vbr_2pass - Multi pass VBR

bitrate

Sets bitrate in Kbps.

max_bitrate

Sets max bitrate in Kbps.

init_bufsize

Specifies the VBV(HRD) initial delay in Kbits.
0 - use the default VBV initial delay

bufsize

Specifies the VBV(HRD) buffer size in Kbits.
0 - use the default VBV buffer size

qpi, qpp, qpb

Specifies the initial QP to be used for encoding, these values would be used for all frames if in CQP mode.

qmin

Specifies the minimum QP used for rate control.

qmax

Specifies the maximum QP used for rate control.

initialRCQP

Specifies the initial QP used for rate control.

quality

Target Constant Quality level for VBR mode (range 0-51 with 0-automatic);

keep_sar

If your input stream is anamorphic you might need to save its SAR parameter in the output as well, especially if you’re using a 'scale' filter in your Transcoder pipeline while DAR = SAR x Width / Height. Nimble supports keeping input SAR using keep-sar parameter set to true for encoder in its ‘Video output’ section. SAR/DAR/PAR correlation is described in this article.

monoChromeEncoding

0 - disable
1 - enable

frameFieldMode

Specifies the frame/field mode.

frame - NV_ENC_PARAMS_FRAME_FIELD_MODE_FRAME
filed - NV_ENC_PARAMS_FRAME_FIELD_MODE_FIELD
mbaff - NV_ENC_PARAMS_FRAME_FIELD_MODE_MBAFF

mvPrecision

Specifies the desired motion vector prediction precision.

default - NV_ENC_MV_PRECISION_DEFAULT
full_pell - NV_ENC_MV_PRECISION_FULL_PEL
half_pell - NV_ENC_MV_PRECISION_HALF_PEL
quarter_pel - NV_ENC_MV_PRECISION_QUARTER_PEL

enableAQ

Enable Spatial adaptive quantization.

0 - disable
1 - enable

aqStrength

Specifies AQ strength.
AQ strength scale is from 1 (low) - 15 (aggressive).

enableTemporalAQ

Specifies Temporal adaptive quantization.

0 - disable
1 - enable

strictGOPTarget

Set to enable to minimize GOP-to-GOP rate fluctuations.

0 - disable
1 - enable

enableLookahead

Enable lookahead with depth <lookaheadDepth>.

lookaheadDepth

Maximum depth of lookahead with range 0-32 (only used if enableLookahead=1)

disableIadapt

Disable adaptive I-frame insertion at scene cuts (only has an effect when lookahead is enabled).

0 - none
1 - disable adaptive I-frame insertion

disableBadapt

Disable adaptive B-frame decision (only has an effect when lookahead is enabled)

0 - none
1 - Disable adaptive B-frame decision

minCUSize

Specifies the minimum size of luma coding unit.

auto
8 - 8x8
16 - 16x16
32 - 32x32
64 - 64x64

maxCUSize

Specifies the maximum size of luma coding unit. Currently NVENC SDK only supports maxCUSize equal to 32.

auto
8 - 8x8
16 - 16x16
32 - 32x32
64 - 64x64

useConstrainedIntraPred

Constrained intra prediction.

0 - disable(default)
1 - enable

disableDeblockAcrossSliceBoundary

loop filtering across slice boundary

0 - enable(default)
1 - disable

outputAUD

Write Access Unit Delimiter syntax:

0 - disable
1 - enable(default)

enableLTR

Use of long term reference pictures for inter prediction.

0 - disable(default)
1 - enable

enableIntraRefresh

Gradual decoder refresh or intra refresh. If the GOP structure uses B frames this will be ignored.

0 - disable
1 - enable

intraRefreshPeriod

Specifies the interval between successive intra refresh if enableIntrarefresh is set.
Requires enableIntraRefresh to be set.
Will be disabled if NV_ENC_CONFIG::gopLength is not set to NVENC_INFINITE_GOPLENGTH.

intraRefreshCnt

Specifies the length of intra refresh in number of frames for periodic intra refresh.
This value should be smaller than intraRefreshPeriod.

ltrTrustMode

Specifies the LTR operating mode.
Set to 0 to disallow encoding using LTR frames until later specified.
Set to 1 to allow encoding using LTR frames unless later invalidated.

ltrNumFrames

Specifies the number of LTR frames used.

If ltrTrustMode=1, encoder will mark first numLTRFrames base layer reference frames within each IDR interval as LTR.
If ltrMarkFrame=1, ltrNumFrames specifies maximum number of ltr frames in DPB.
If ltrNumFrames value is more that DPB size(refs) encoder will take decision on its own.

sliceMode

sliceMode in conjunction with sliceModeData specifies the way in which the picture is divided into slices

sliceMode = 0 CTU based slices,
sliceMode = 1 Byte based slices,
sliceMode = 2 CTU row based slices,
sliceMode = 3, numSlices in Picture
When sliceMode == 0 and sliceModeData == 0 whole picture will be coded with one slice.

sliceModeData

Specifies the parameter needed for sliceMode. For:

sliceMode = 0, sliceModeData specifies # of CTUs in each slice (except last slice)
sliceMode = 1, sliceModeData specifies maximum # of bytes in each slice (except last slice)
sliceMode = 2, sliceModeData specifies # of CTU rows in each slice (except last slice)
sliceMode = 3, sliceModeData specifies number of slices in the picture. Driver will divide picture into slices optimally.

maxTemporalLayersMinus1

Specifies the max temporal layer used for hierarchical coding.

These are the parameters which you can use already in order to control NVidia video encoding hardware acceleration.

Troubleshooting

Linux drivers

Linux version of NVENC has some known issues with HEVC support for outdated driver versions.
If you see the following message in nimble.log:
failed to lock output bitstream encoder=0x3d871f0, status=8
then you need to update your driver to version 381.22 or higher. Version 378.13 has issues described in this discussion.

GTX 950

Linux version of NVENC does not allow more than one HEVC decoding session for GTX 950. This is also a known issue which is reproduced for ffmpeg. This problem might not exist on other cards.

Other issues

Please check Transcoder troubleshooting article covering most frequent issues.

We keep improving our transcoder feature set, so please contact us for any questions.

Pages

June 7, 2017

NVidia NVENC settings for H.265 in Live Transcoder