The typical corporate firewall policies are configured to block all UDP traffic or may even have stricter configurations of allowing limited number of ports like HTTPS (443) or DNS (53) in addition to blocking UDP traffic. This inevitably leads to problems in media traversal over such restrictive firewalls.
This write-up is an attempt at proposing solutions for the problem of implementing a seamless WebRTC or Real-time Multimedia Session across any restrictive firewall. It explains the potential problems WebRTC based media traversals can face despite the built-in protocols.
WebRTC is really an umbrella/suite of protocols, which relies on 4 different protocols to enable RTC on Web
The following sections describe some hard firewall topologies only, ignoring the topologies (like Symmetric NAT, Full-cone etc.), which are already addressed by the vanilla WebRTC suite of protocols.
A typical corporate firewall configuration is to block out all the UDP traffic. This directly affects the Media transportation since SRTP is transported over UDP.
In general, this is easily mitigated by adding a TURN server, assuming that the topology allows for STUN and TURN protocols to operate (they can over TCP)
Only Two Ports Allowed
An even stricter configuration than the above (No UDP) could additionally have only 2 open ports:
1. HTTPS (443)
2. DNS (53)
This scenario will block not only the Media (which would be established much later in the session negotiation), but also STUN & TURN. The ICE exchange would not complete successfully in this scenario.
A simple mitigation for such a scenario is to run the STUN & the TURN Server on 443 and configure the messages to travel on TLS/TCP. Generally, the firewalls would allow this since they are unable to peek into the packets (due to TLS) and assume that this is HTTPS traffic.
However, in this case as well, the Media would have to fall back to being relayed over TURN (also running on 443).
If the clients are connecting via VPNs, especially VPN products designed for web anonymity (like TunnelBear), then the STUN/TURN may fail altogether, since are specifically designed to block STUN/TURN messages. It appears, as of this writing, that they are able to do this even if TLS is used. Due to their commitment to anonymity, the STUN/TURN protocols are perhaps viewed as means of ‘leaking’ the identity of the endpoint.
Many corporate network deployments use proxies for further security. The most common proxies are HTTP/HTTPS/WebSocket proxies.
Many legacy proxies may not support WebSockets, since they do not upgrade a WebSocket properly. This means, that in such networks, even signaling (assuming it’s over WebSockets) will fail. For such topologies, the signaling will itself have to fallback to REST/HTTPS, which can become quite cumbersome in designing a real-time application and could result in a reduced functionality & performance.
Modern proxy-based topologies will typically support WebSockets, which means the signaling path is clear. However, ICE will fail or give incorrect results
As of this writing, it appears that the only full-proof way to ensure a successful exchange of the media in this topology is tunnel the media via Secure WebSockets. Note, however, that this will not work in topologies which have legacy proxies which do not support the WebSocket upgrade protocol.
A fall-back method that can be used for very restrictive networks (e.g. UDP blocked and symmetric NAT) is to configure a TURN server to be accessible over TLS on port 443 or TCP over port 80.
Port 443 is reserved for TLS traffic, which is encrypted. This means that beyond the destination of the data, the firewall we’re dealing with can’t know anything about what’s being sent or where, so it will usually treat it as HTTPS traffic and will just pass it along.
The following existing applications maybe using the TURN over TLS/TCP fall-back:
·1. Hangouts Meet
2. Google AppRTC
3. Jitsi Meet
5. Facebook Messenger
This solution can fail for some topologies.
A. Web Proxy
As discussed in the previous section, some corporate networks may have a proxy in addition to a firewall. A Web Proxy, being a man-in-the-middle will open all packets passing through it, as it is required for it to correctly setup the proxy circuit. The Proxy may decide to drop the TURN packets, since they do not conform to the HTTPS protocol specification.
B. VPN Applications
Applications which provide users with anonymity via VPNs will likely reject any STUN/TURN packets as these ‘leak’ information regarding the network endpoints of the consumer.
PROS AND CONS
WebSockets can be used to transfer the media which is certainly not an optimal choice. It is similar to using TURN/TCP in WebRTC — it has a quality impact and will not work well in quite a number of cases.
The primary advantage of using media over WebSockets is that it might pass firewalls where even TURN/TCP and TURN/TLS could be blocked. And it certainly avoids the issue of WebRTC TURN connections not getting past authenticated proxies
HIGH LEVEL ARCHITECTURE
On the rendition (receive) side, data received on the WebSockets goes into a WebAssembly (WASM) based decoder. Audio is fed to an AudioWorklet in browsers that support that. From there the decoded audio is played using the WebAudio “magic” destination node. The Video is painted on an HTML5 canvas.
On the transmission side, WebAudio captures media from the getUserMedia call and is sent to a WebAssembly encoder worker and then delivered via WebSocket. Video capture is grabbed from a canvas before being sent to the WebAssembly encoder.
Using WebAssembly, it is possible to create Web versions of commonly available, open source codecs (Opus, H.264 etc) which encode/decode and also RTP packetize the media.
On the server side, if inter-working with vanilla WebRTC is required, then an inter-working function would need to be implemented which will convert WebSocket transported media to SRTP based media. This inter-working function could feed the WebRTC Media Server (Jitsi, RED5 or others).
Zoom appears to be following this or a similar methodology for its browser based offerings.
As of this writing, there appears to be no known or understood topology for which this solution will not work.
PROS AND CONS