How to easily solve CMTS Bufferbloat

How to easily solve CMTS Bufferbloat
It's not the buffering, it's the latency!
July 23, 2014 (updated 07/25/2014) - Technology Blog Index
duckware.com/bufferbloat

1. The CMTS Bufferbloat problem

Bufferbloat: Bufferbloat is a well documented problem with how the Internet works. When buffering in network equipment increases, latency and jitter increases, and can cause other 'interactive' applications (VoIP, etc) to experience horrible latency, and fail. It is important to understand that bufferbloat can happen even in completely uncongested networks with no packet loss.

An prime example: On a completely unused 3.4Mbps broadband cable modem connection, and an uncongested network, place a VoIP phone call. The VoIP phone call works incredibly well (using around 87kbps - source). However, while on the VoIP phone call, initiate a download of a large file on a remote server with an 800k TCP receive window. That receive window size causes the CMTS (the cable company equipment that communicates with your cable modem) buffer to fill up to near 800k within a second or two, introducing a 1750 millisecond (1.75 seconds!) latency into the VoIP phone call, which makes using VoIP all but impossible. The VoIP phone call is very effectively killed. [Source: actual testing on Comcast's network in GA, while monitoring CMTS RTT times]

What the experts say: To solve this problem, experts say that we must reduce buffering and actually intentionally drop more packets (introduce packet loss; like RED, or Random Early Detection). [Source: A senior VP within Comcast's CTO office told me this].

The wrong problem is being solved in the wrong way: So, solve the problem by introducing packet loss? But the queue, or buffering, was never the problem (in the example above, there is NO congestion, or overflow of buffers). The entire problem is latency. So why not solve the real problem, latency.

The real problem: The real problem with Bufferbloat in the CMTS is latency, caused by a single FIFO buffer. Using the prime example above, when packets are received for the VoIP phone call (seen in red below), the packed is queued up and must wait for 1750 milliseconds in the CMTS FIFO buffer behind download data (seen in blue below):

A CMTS with a single FIFO queue for a cable modem is the PROBLEM

2. The CMTS Bufferbloat solution

The crazy simple solution: Rather than implementing a single FIFO queue, the CMTS must implement a FIFO queue per "entity", and these multiple FIFO queues are then serviced in a round-robin manner, sending packets to the cable modem. All queues share a single common buffer. The "entity" could be as fine grained as per TCP connection (strongly discouraged), but a more reasonable choice for a CMTS is server IP address (something ALL IP packets have).

The SOLUTION is for the CMTS to use one FIFO queue for each server

The latency problem is absolutely solved! And no packets are dropped or lost. The large file download, rather than slowing down in order to empty the CMTS queue, actually operates full blast and simply shares the connection with any VoIP traffic once it arrives.

Namely, with the download going full blast, and the VoIP phone call active, most of the time that you look inside the CMTS, you will only see the single download queue. But every 20ms, the CMTS receives a VoIP packet, which is placed into its own queue, and because the head of each queue is serviced round-robin and sent to the cable modem, the VoIP queue is immediately emptied, and there is once again only the download queue remaining. Now instead of the VoIP packet experiencing a 1475ms delay inside a single shared queue, the VoIP packet will only experience a couple of milliseconds delay within its own unique queue, before being sent to the cable modem.

More intelligent packet drops: And if the shared buffer fills up due to actual congestion, packets can be more intelligently dropped -- like from the head of the longest individual FIFO queue (the server responsible for hogging the most bandwidth). In the example above, during actual congestion in the CMTS, the VoIP phone call would have NO packets dropped (with a queue of only one packet at any given time), but the download (with a much larger queue) would experience a dropped packet and be forced to slow down.

A FIFO queue per server: We expressly do not create a queue per TCP connection, as that would inappropriately reward programs that open up lots of connections to the same server (what most web browsers do and many 'download managers').

Increased priority for new queues: An optional optimization is to tag each queue with a creation time. Any queue less than a certain age (like 100 ms) could be assumed to be (more than likely) the result of user interaction, and given a higher priority (like maybe twice as much opportunity in the round-robin queue). This very effectively lowers the priority of queues that saturate the bandwidth, and rewards queues that use a tiny slice of the bandwidth. This simple change rewards non-bandwidth hogs (like DNS queries, VoIP traffic, and other interactive applications).

It is important to understand that a queue 'exists' within the CMTS for as long as data remains buffered from a server. When all the buffered data has been sent to the cable modem, the queue goes away. And when data starts up again, a 'new' queue is created, and gets a very temporary boost in priority, once again.

A very easy way to implement this 'priority' is via byte counting. Queues can transmit B×P bytes before yielding to the next queue (round robin). B is some constant number (like the MSS), and 'P' is the priority factor (like 1 for normal priority, and 2 for higher priority).

In Summary: In a CMTS with a single FIFO queue, high bandwidth applications dramatically and very negatively affect all other Internet traffic. Latency can increase very quickly and to a very high level (well over a second). Whereas, in a CMTS with a multiple FIFO queues per "entity", high bandwidth applications are allowed to run full-tilt, but are preempted by the needs of other Internet applications. Latency is always relatively bounded. This improved algorithm is incredibly easy to understand, and therefore likely to be actually implemented, and used.

3. The analogy to CPU scheduling

Scheduling packets within the CMTS for delivery to a cable modem is exactly analogous to an OS scheduling tasks on a CPU. The CPU is the broadband bandwidth, and the tasks are IP packets.

4. Learn more

http://www.bufferbloat.net

Some content may be copyrighted by their respective owners and is reproduced
in this paper under Title 17 Section 107 (Fair Use) of the copyright code.
The rest of this document is Copyright © 2014-2024 Jerry Jongerius