Programming

Polling, WebSockets, and SSE

Up until recently, I thought that the difference between long and short polling was just the interval of the polls.

However, that is not actually the case, there is a more objective difference: where the “waiting” is held.

What is short polling?

In short polling, the client is sending requests at an interval, asking the server for new updates. The server then immediately replies with the new updates, even if there are none.

Let’s imagine a real-case scenario, you are creating a chatbot for a messaging platform like Discord, and you want to know when you receive a new message, one way you could do it is by sending a request like GET /messages/?unread=true every 500 ms or so.

The server then would reply with all of the messages that you (the chatbot) haven’t read yet.

image_2023-02-11_121755843

This is the simplest polling, but also expensive, especially considering the handshake overhead of TCP. You have to establish a connection for each request sent.

Thus, short polling is always a no-go.

What is long polling?

In long polling, however, the client sends the request, and the server hangs until there are new updates. (normally with a configured timeout)

In the previous scenario, you would send the GET /messages/?unread=true request, and the server would only reply when a new message is there.

long-polling

This one is better since you are only sending one request, thus being lightweight on resources.

Where long polling is used

There is a commonly used technology that makes use of long polling: Kafka.

In Kafka, the consumer polls the broker for new messages, and it is configured to respond when there is a minimum byte size of messages available, or at a time limit.

Having the possibility of such a configuration, plus being light on resources, is the benefit of long polling.

Better alternatives for websites

For websites/browsers, however, polling isn’t used anymore, but rather WebSockets and Server-Sent Events (SSE), which, for those cases, are better alternatives.

WebSockets (WS/WSS)

WebSockets is a protocol, distinct from HTTP, where the connection is established, with support for a bidirectional flow of messages.

Thus, both the server and the client can send messages to each other.

websockets

The different colors are used to indicate that the messages are independent of one another.

The handshake is a GET request with the following key headers:

Upgrade: websocket
Connection: Upgrade

Then the server response is:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade

Those are the key headers, another key one is the “Sec-WebSocket-Protocol”, which specifies the subprotocol to be used for the messages exchanged.

From now on, the protocol used is WebSockets.

Server-sent events (SSE)

Server-sent events are, just like the name suggests, messages that are sent straight by the server to the client, without any interaction needed from the client. This

The functional difference between this and WebSockets is that, after the client connects to the server, the flow is unidirectional, coming only from the server.

A key technical difference is that SSE uses the HTTP protocol, this is a great positive point, we’ll see later why.

image_2023-02-12_110251143

Although the client can’t send messages via the SSE connection, it can send via normal HTTP requests.

Which one to choose: WebSockets or SSE?

One might find SSE useless since with WebSockets the client can make use of the connection to send messages.

However, since SSE relies on HTTP, it is never blocked by firewalls, and it has more support from Web Application Firewalls.

Security

Browsers have built-in security measures for HTTP, whereas in WebSockets, being a separate protocol, some practices are needed on the server side.

One example of it is the lack of the Same-Origin Policy, which allows an attacker to:

  1. send you a link to its website with a malicious JS script
  2. that script sends the WS handshake to Discord
    • if this was an SSE request, the browser wouldn’t allow it (due to the Same-Origin policy)
  3. Discord receives that request
    • if the Discord server doesn’t validate the Origin header: the attacker can read the messages you receive and send them to its server

Check this article for more security practices.

SSE limitations

Browsers have a limit of HTTP connections, depending on the browser, so SSE gets affected by this.

SSE also isn’t able to transmit binary data, since the data sent must be text UTF-8 encoded, a workaround for it is to use an encoding such as base64, but the client would need to decode it on its end.

Conclusion

To be honest, the choice here comes down to which one you have the easiest support:

  1. which one does your framework/language? what about your WAF?
  2. do you intend to send binary data to the client?
    • if so, WS is probably the easiest choice
  3. can firewalls be an issue?
    • normally only hardened corporate ones, if so, then SSE is the way to go

What about SPDY, Server Push…?

SPDY was a protocol very quickly made obsolete, and Server Push from HTTP 2.0 is being removed from Chrome, thus I don’t think it’s worth going over.

Leave a comment