July 17, 2015

Protecting media links from web scraping

Security and billing are often considered as most valuable tasks for every media business. It's hard to imagine a company that doesn't care about protecting its own content, which in fact underlies company's gain. Unprotected content can be used by other people for commercial purposes and that actually means missed profit for the content owner.

WMSPanel team provides their customers with set of paywall features for Nimble Streamer and Wowza, those are dedicated to solve the above tasks. The best-known feature is hot-linking protection, which allows to protect specified links from re-streaming and further re-use. Briefly, media URL is accompanied with special signature, which contains information about IP address of viewer and period of time the URL is valid for. So, protected URL can't be used to retrieve media content from any different IP address and expires after predefined period of time. Perfect!

However sometimes, our customers report, that hot-linking protection is broken, because they've seen an application, playing their protected streams. The application's name is usually Kodi, SimpleTV or PlaylisTV, but that isn't a full list. Reported case looks like re-streaming, but in fact it's not. Those applications don't produce media URL, but scrape it from a web page, which is in most cases accessible for everyone. Roughly speaking, they act as "man-in-the-middle", requesting web page from viewer's IP address, taking URL from that page and pasting it inside its own media player.

So, nothing to worry about? Not exactly.
Content owner may earn a part of his income through the use of advertisement or may want to show press releases to his audience, or anything else. Mentioned applications prevents him from doing all of the above. That's the point, where new task appears: protect media links from web scraping.

And it appears to be not very easy. The only reliable way to protect streams from such applications is using website authorization with some tracking system on media server's side, like WMSPanel pay-per-view framework. In such way, content owner can monitor every user, detect those having too much activity, views or view time and block them.

The more simplified approach for media owner, but less convenient for viewers is adding captcha or recaptcha dialog to page, containing media-player. In that way web-server should set up the player with valid media URL only after viewer performs correct action, which is simple to do for human, but very hard for any automatic system.

However, both described approaches ain't always applicable for all customers. Especially, if they really want their pages to be accessible without authorization or typing captcha, but nevertheless protect them from URL scraping. In this case, various obfuscation techniques can be used. The point is to generate media URL on web-page instead of just statically assign it. Of course, these technics don't guarantee, that protected URL can't be retrieved, but they can make that task very hard. Especially, if obfuscation processing periodically changes.

The basic approach is to use javascript to generate media URL and store part of the URL inside a web page hidden element. Better to surround it with similar fake elements to avoid preparsing. Another part can be inserted into javascript code as, for example, array of strings and then joined.

<span style="display:none" id="ghniBkSitfacreteus">Ut3lzmz0WGmh5WTFWWsS5aavQdNTmP5=WTUORUbVZtcI1RPd</span>
<span style="display:none" id="ftkgieaechSisruBnt">vdzcW105Fbl=TUsGR3RIWtmZThmPSmWUUWadTtWON5zQ5PVa</span>
<span style="display:none" id="gkSeirausinBathtfc">bmtWSW5QOUIvWlRzcTh5NUZ3PT0mdmFsaWRtaW51dGVzPTU=</span>

<script type="text/javascript">
    var leatarnAreesuSrbUrigy = ["aF9", "8xN", "ydm", "Q25", "x1Z", "c6N", "2YW", "I",
                                 "3Rp", "T1N", "VyX", "c2V", "GFz", "0ma", "1ID", "TQ6",
                                 "9Ny", "gQU", "y8y", "NDI", "bWU", "MDE"];
    var bteulrSeariAresyraUgn = ["Q25", "c6N", "bWU", "1ID", "MDE", "gQU", "c2V", "y8y",
                                 "NDI", "0ma", "TQ6", "8xN", "ydm", "x1Z", "T1N", "GFz",
                                 "2YW", "9Ny", "3Rp", "I", "VyX", "aF9"];
    var atgauAUesreiynrbSlrre = ["c2V", "NDI", "Q25", "0ma", "MDE", "GFz", "gQU", "8xN",
                                 "y8y", "T1N", "1ID", "ydm", "I", "3Rp", "c6N", "VyX",
                                 "x1Z", "9Ny", "2YW", "TQ6", "bWU", "aF9"];

    function getHttpUrl() {
        return ( ["h", "t", "t", "p", ":", "\/", "\/", "l", "i", "v", "e", "1", ".",
                  "m", "y", "s", "i", "t", "e", ".", "t", "v", ":", "3", "4", "5", "6",
                  "\/", "m", "y", "a", "p", "p", "\/", "m", "y", "s", "t", "r", "e",
                  "a", "m", "1", "\/", "p", "l", "a", "y", "l", "i", "s", "t", ".",
                  "m", "3", "u", "8", "?", "w", "m", "s", "A", "u", "t", "h",
                  "S", "i", "g", "n", "="].join("") + atgauAUesreiynrbSlrre.join("") + 
                  document.getElementById("gkSeirausinBathtfc").innerHTML);
    }
</script>

So, application would have to use HTML parser and analyze javascript code in order to get the URL. Next steps could be using multiple javascript files, AJAX and even custom player, to make scraping work hard and annoying.

You can download full sample code from our Github repository.

The described approach works both for Nimble Streamer and Wowza media servers. However, if you use Nimble Streamer, you can apply javascript obfuscation to stream-based signature, which allows to provide unique signature for every stream.



Besides hotlink protection, WMSPanel provides several more security features, such as: Geo-location and IP range restrictions, Domain lock, HLS AES encryption for DRM and more.


Related documentation

WMSPanel, Nimble Streamer, Hotlink protection for Nimble Streamer, Paywall for Nimble Streamer, Paywall FAQ

No comments:

Post a Comment