August 26, 2015

Achieving low latency for live streaming using Nimble Streamer

Latency is what one part of media companies doesn't care about while the other part is really concerned with. If you provide a content, which requires real-time interaction or impact wide audience simultaneously, like great sports event broadcast, then latency is important for you. Actually, it isn't related to media companies only. At the moment, almost every human can pick up a mobile device (modern, of course), install streaming application and broadcast himself/herself all over the globe.

In the video world, latency is the amount of time between the moment a frame is captured and the moment that frame is displayed. Low latency is a crucial requirement for many streaming scenarios. For example, surveillance systems require immediate reaction of observer on any restricted action. Remote participation in public auction doesn't make sense if buyer sees the situation happened several seconds before. Popular sports broadcast, video conferencing and remote quadrocopter piloting are also obvious cases.

There is no specific common value that defines "low latency". Instead, what is considered acceptable low latency is defined by a field of application. For example, 600 milliseconds delay is considered acceptable for remote auction participation system, while certain medical instrumentations require latency to be under 30 milliseconds. But of course those systems are completely different from a technical point of view and the medical system approach would be never suited for any distributed communication system.
Video content is passed through several stages of processing from being captured to being displayed for the viewers and each stage makes contribution to the total delay. The main contributors are the stages those require temporal data storage, i.e. buffering. It's imposed whenever processing must wait until some specific amount of data is available. Therefore, to achieve suitable latency for given system, we need to inspect the system's configuration, identify the major contributing stages and find a way to reduce buffering.

In this article we consider a system streaming H.264 1080p30 video content from a camera or encoder to a display over Internet using media server as intermediate. The processing pipeline can be broken down as follows:

1. Video capture


Short step, depending totally on hardware. Can be excluded from consideration, due to low impact on the delay (< 0.5 ms).


2. Encoding (video compression)


Very important step having influence on all consecutive steps. Hardware encoders are preferred over software ones from the latency point of view, because software codecs typically feature latency overheads related to memory transfers and task-level management from the OS. Correctly set up encoder doesn't introduce noticeable delay itself, but defines bit rate attributes of resulting stream. The bit rate is considered of 2 types: variable (VBR) and constant (CBR).

The main advantage of VBR is that it produces a better quality-to-space ratio compared to a CBR for overall data. However, it requires more processing time and impose problems during streaming when the instantaneous bitrate exceeds the data rate of the communications path, thus increasing the decoders's buffer (as it was mentioned earlier, buffering is imposed whenever processing must wait for particular data). That's why CBR is mandatory for real-life low-latency video systems.

There is more to consider about CBR. In fact CBR isn't constant at any level, because H.264 video content consists of frames having different size. So encoder performs bit rate control processing to force producing the same amount of stream data over equal periods of time, called averaging periods. Bit rate control comes at the expense of quality. The less is averaging period and therefore decoder's buffer, the lower is quality of streamed video.

Production encoders provide various  bit rate control capabilities, which are intended to provide CBR with minimum impact on quality. Description of those capabilites is behind the scope of this article, but you may consider some key features to distinguish latency efficient encoders, including sub-frame Rate Control Granularity and Content-Adaptive Rate Control.


3. Streaming to media server over local network


This step contribution consist of local network delay, which depends on configuration and hardware, possible internal buffering of the encoder and streaming protocol overhead. In order to achieve best of this step, encoder should be properly configured to have at most few frames in its buffer, and connected as close as possible to media server. Streaming protocol should be selected depending on encoder and media server capabilities. The recommended protocols having negligible latency overhead are: MPEG2-TS, RTSP or RTMP.


4. Streaming to viewer's device over Internet


That's most interesting step and the biggest contributor for many scenarios. However, using efficient media server, real-time streaming protocol and reliable Internet connection this step can add comparable or even less delay, than the next one. The first contributor within this step is the media server's internal buffering, which is used during transmuxing of the input stream from encoder. The second is media server buffering related to streaming protocol specifics.

If you want to use HTTP-based protocol, such as HLS or MPEG-DASH to deliver video content to a viewer, be ready to significantly increase streaming latency. The reason is that HTTP-based protocols are designed to deliver media content chunked into segments. The segment size may vary depending on the protocol's parameters, but it shouldn't be less than 2 seconds. For example, the Apple HLS implementation has segment size equal to 10 seconds. So, if you use HLS in such configuration, only this step adds more than 10 seconds to your video streaming system's end-to-end latency.

You may suppose, that HLS or MPEG-DASH can be configured to achieve really low latency by reducing the segement size. Yes, that can be done theoretically but in a very specific environment, having almost no restrictions on networking and software capabilities, which is far away from real life. If you want to create low latency video system in real environment, you should use real-time streaming protocols, such as RTMP and RTSP to deliver content to your viewers.

The last part of this step, that can't be configured but must be taken into account is network delay and jitter. The network delay is simply added to the total latency value, while jitter is considered in the decoder's buffer size.


5. Decoding


It's not really obvious, that this step can be the major contributor. To make sure the decoder doesn’t run out of data during playback, the playout buffer must store all the data corresponding to one complete averaging period of stream with respect to network jitter. So, buffer may vary from containing several GOPs down to few frames, depending on encoder and network parameters. Many players consider the default minimum value for playout buffer equal to 1 second and adjust it during the playback. The least possible playout buffer can be achieved using hardware decoders (players) such as Raspberry Pi.


6. Displaying


This step is similar to the first and is mentioned to complete the picture.


To sum up all the above, we have a real life example, which was provided to us by our customer. His company has built a video streaming system, that is used for remote auction participation and horse-race bets conducted in Australia. Bidders can participate from all over the world, but the main stage, having the best streaming performance is located in Macau.

The system's configuration is very simple. Hardware Encoder publishes stream to Nimble Streamer via RTMP protocol over local network. By default, Nimble Streamer buffers some amount of frames in order to provide smooth outgoing stream in case of unstable incoming stream. If encoder is connected over local network, there is no need in buffering. Setting rtmp_buffer_initial_offset = 0 in the Nimble Streamer's config file makes it to serve the most recent frame and therefore provide the lowest possible latency.

The outgoing stream is pulled by Raspberry Pi embedded RTMP player and is displayed on a large presentation screen in Macau. Ping time from the company's location in Australia to Macau is 141 milliseconds. Raspberry Pi RTMP player's buffer is set to 300 milliseconds. And the resulting end-to-end latency of the 1080p30 stream that is displayed in company's Macau location varies in range from 500 milliseconds to 600 milliseconds.

As you can see, a low latency video streaming system is built in a very easy way and has reasonable cost, which consists mostly of hardware expenses. Nimble Streamer is freeware, so you can build your own streaming system, using free or inexpensive products for encoding and playing. It may not have as high performance as the described streaming system, but suffice for your case.

Related documentation


Softvelum Low Delay Protocol,
Nimble StreamerBuild streaming infrastructure with Nimble StreamerLive Streaming features in NimbleReal-Time Messaging Protocol in Nimble StreamerBuilding RTMP live streaming network via Nimble StreamerLarix BroadcasterNimble Streamer performance tuningLive Transcoder for Nimble Streamer

3 comments:

  1. Really good article. I was wondering, then, given that rtsp is not supported on html5, whats a low-latency alternative to HLS that works on web...

    ReplyDelete
  2. really good question. low latency is corner case for segment-based protocols like hls and dash. currently available alternative is rtmp though. but WebRTC is what will probably replace it. But we don't know

    ReplyDelete