March 24, 2013

Improving availability and robustness with DNS failover

Hi,

WMSPanel gathers data from hundreds of Wowza servers and processes data about hundreds of thousands of streams. So having the service that must be reliable and available 24/7 we improve our robustness all the time.

From the very beginning we use cloud hosting and this allows easily scaling our computing resources in order to make them more reliable. But every virtual cloud hosting is based on some hardware. And we know that every hardware fails sooner or later. So there are several way to dodge it.

We use several virtual servers for each specific part of data processing and storage. As example, we have 2 virtual servers which process sync packets coming from customers' servers. Each virtual server is located at different hardware unit. Thus the load is spread across 2 machines each having 4 CPU cores. So even if the load increases and at the same time one of servers fails at one of the physical hardware units, we'll have enough computing power at different physical instance to handle incoming requests. And we can easily bring up more servers to keep up the processing.

The same applies to UI processing. Each incoming request from your browser goes to load balancer and the is forwarded to either of 2 servers that work as front-end. So regardless of the load and the availability of any of those servers, we always have some entity which will respond when you browse the statistics reports.

All other part of our system are designed the same way to make less points of failure.

There was just one part which did not have its "hot backup". It's a load balancer. If it failed, we would be in deep trouble because every request from either web browser or Wowza agent goes through it.

We decided to use DNS failover based on multiple DNS A records. Those records point at 2 identical load balancers which then normally route the traffic according to its target. WMSPanel has 2 sources of incoming requests: users' browsers (both PC and mobile) and Wowza servers agents.

Let see what happens in case of failure of the load balancer at the primary IP which is set up in DNS record.
  • A browser will just try secondary IP as it should have it in the local DNS cache. So this will work seamlessly for the end user and it's a default behavior for all web browsers. 
  • In the same situation the Wowza agent will work the same way. It will check DNS for secondary IP and send its sync-ups over there. Note that you need to have an agent version at least 1.3.0.0 to handle it correctly.
There may be more than just two DNS records so when we need another balancer, even located in the different data center (e.g., Latin America or Asia), we'll just add a new record there and it will work perfectly.

Read a complete blog post about WMSPanel architecture and design to see how we provide high availability and robustness, with DNS failover being a part of it.

If you haven't yet used our service, don't wait and try it now for 2 weeks free of charge.

No comments:

Post a Comment