How to easily solve CMTS Bufferbloat|
It's not the buffering, it's the latency!
July 23, 2014 (updated 07/25/2014) - Technology Blog Index
1. The CMTS Bufferbloat problem
Bufferbloat: Bufferbloat is a
problem with how the Internet works. When buffering in network equipment increases,
latency and jitter increases, and can cause other 'interactive' applications
(VoIP, etc) to experience horrible latency, and fail.
It is important to understand that bufferbloat
can happen even in completely uncongested networks with no packet loss.
What the experts say: To solve this problem, experts say that we must reduce
buffering and actually intentionally drop more packets (introduce packet loss; like
RED, or Random
Early Detection). [Source: A senior VP within Comcast's CTO office told me this].
An prime example: On a completely unused 3.4Mbps broadband cable modem connection, and
an uncongested network, place a VoIP phone call. The VoIP phone call works incredibly well (using around
87kbps - source).
However, while on the VoIP phone call, initiate a download of a large file on a
remote server with an 800k TCP receive window. That receive window size causes
the CMTS (the cable company equipment that communicates with your cable modem) buffer to
fill up to near 800k within a second or two, introducing a 1750 millisecond (1.75 seconds!) latency
into the VoIP phone call, which makes using VoIP all but impossible. The VoIP phone call is
very effectively killed. [Source: actual testing on Comcast's network in GA, while
monitoring CMTS RTT times]
The wrong problem is being solved in the wrong way: So, solve the problem by
introducing packet loss? But the queue, or buffering, was never the problem
(in the example above, there is NO congestion, or overflow of buffers). The entire
problem is latency. So why not solve the real problem, latency.
The real problem: The real problem with Bufferbloat in the CMTS is latency, caused
by a single FIFO buffer. Using the prime example above, when packets are received for the
VoIP phone call (seen in red below), the packed is queued up and must wait for 1750
milliseconds in the CMTS FIFO buffer behind download data (seen in blue below):
A CMTS with a single FIFO queue for a cable modem is the PROBLEM
2. The CMTS Bufferbloat solution
The crazy simple solution: Rather than implementing a single FIFO queue,
the CMTS must implement a FIFO queue per "entity", and these multiple FIFO queues
are then serviced in a round-robin manner, sending packets to the cable modem.
All queues share a single common buffer. The "entity" could be as fine grained
as per TCP connection (strongly discouraged), but a more reasonable choice for
a CMTS is server IP address (something ALL IP packets have).
The latency problem is absolutely solved! And no packets are dropped or lost.
The large file download, rather than slowing down in order to empty the CMTS queue,
actually operates full blast and simply shares the connection with any VoIP traffic once
The SOLUTION is for the CMTS to use one FIFO queue for each server
Namely, with the download going full blast, and the VoIP phone call active, most of
the time that you look inside the CMTS, you will only see the single download queue. But
every 20ms, the CMTS receives a VoIP packet, which is placed into its own queue, and
because the head of each queue is serviced round-robin and sent to the cable modem, the
VoIP queue is immediately emptied, and there is once again only the download queue remaining.
Now instead of the VoIP packet experiencing a 1475ms delay inside a single shared queue, the VoIP
packet will only experience a couple of milliseconds delay within its own unique queue, before being sent
to the cable modem.
More intelligent packet drops: And if the shared buffer fills up
due to actual congestion, packets can be more intelligently dropped -- like from
the head of the longest individual FIFO queue (the server
responsible for hogging the most bandwidth). In the example above, during
actual congestion in the CMTS, the VoIP phone call would have NO packets dropped
(with a queue of only one packet at any given time), but the download (with a much larger
queue) would experience a dropped packet and be forced to slow down.
A FIFO queue per server: We expressly do not create a queue per
TCP connection, as that would inappropriately reward programs that open up lots
of connections to the same server (what most web browsers do and many 'download
Increased priority for new queues: An optional
optimization is to tag each queue with a creation time. Any
queue less than a certain age (like 100 ms) could be assumed to be
(more than likely) the result of user interaction, and given a higher
priority (like maybe twice as much opportunity in the round-robin queue).
This very effectively lowers the priority of queues
that saturate the bandwidth, and rewards queues that use a tiny
slice of the bandwidth. This simple change rewards non-bandwidth
hogs (like DNS queries, VoIP traffic, and other interactive applications).
A very easy way to implement this 'priority' is via byte counting. Queues can
transmit B×P bytes before yielding to the next queue (round robin). B is
some constant number (like the MSS), and 'P' is the priority factor (like 1 for
normal priority, and 2 for higher priority).
It is important to understand that a queue 'exists' within the CMTS for as long as
data remains buffered from a server. When all the buffered data has been sent
to the cable modem, the queue goes away. And when data starts up again, a
'new' queue is created, and gets a very temporary boost in priority, once again.
In a CMTS with a single FIFO queue, high bandwidth applications dramatically and
very negatively affect all other Internet traffic. Latency can increase
very quickly and to a very high level (well over a second).
Whereas, in a CMTS with a multiple FIFO queues per "entity", high bandwidth
applications are allowed to run full-tilt, but are preempted by the needs
of other Internet applications. Latency is always relatively bounded.
This improved algorithm is incredibly easy to understand, and therefore likely
to be actually implemented, and used.
3. The analogy to CPU scheduling
Scheduling packets within the CMTS for delivery to a cable modem is exactly analogous
to an OS scheduling tasks on a CPU. The CPU is the broadband bandwidth, and the tasks
are IP packets.