Head-of-line blocking

In computer networking, head-of-line blocking (HOL blocking) refers to a performance bottleneck that occurs when a queue of packets is held up by the first packet in the queue, even though other packets in the queue could be processed.

In HTTP/1.1, requests on a single TCP connection are usually sent sequentially — a new request can't be made on the connection while waiting for a response to the previous request. This can lead to HOL blocking problems even if there are several TCP connections between the client and server.

HTTP/1.1 defines an optional feature called HTTP pipelining that unsuccessfully attempted to work around HOL blocking, by allowing requests to be sent without waiting for earlier responses. Unfortunately the design of HTTP/1.1 means that responses must be returned in the same order as the requests were received, so HOL blocking can still occur if a request takes a long time to complete. Network conditions such as congestion, packet loss (and the resulting TCP retransmissions), or TCP slow start can also delay transmission and cause later responses to be blocked by earlier ones.

HTTP/2 reduces application-level HOL blocking by introducing request and response multiplexing. With this feature multiple requests and responses can be interleaved over a single TCP connection using independently numbered streams, and stream prioritization helps the server decide which streams to send first. Packet loss at the transport layer can still cause HOL blocking across streams because HTTP/2 runs over TCP — a lost TCP segment can stall all streams carried on that connection until the lost data is retransmitted.

HTTP/3 eliminates the transport-layer head-of-line blocking by using QUIC over UDP and thus HOL problem on HTTP no longer exists. QUIC provides multiple independent streams with per-stream loss recovery, so packet loss affects only the stream where it occurs rather than the entire connection. This removes the TCP-level HOL problem.

See also