Hierarchical Fair Service Curve (HFSC) of OpenBSD

Home

Quality of Service (QoS) is an attempt to give priority to a packet type or data connection on a per session basis. Hierarchical Fair Service Curve takes QoS to the next level over CBQ by focusing on guaranteed real-time, adaptive best-effort, and hierarchical link-sharing service.

Though this may sound difficult, it is really easy to use once you understand the basics.

What HSFC means without technical jargon is, you have the ability to setup rules to govern how data leaves the system. For example...

- You may choose to have ack packets labeled with the highest priority to guarantee those packets go out first. Ack packets are the way you tell the remote system you have received the latest payload and to continue to send the next. This will make sure your data transfers go as fast as they can even on a saturated connection.

- What if you are an avid gamer and other users on your network are slowing your connection down or causing you to loose your connection. You choose to give priority to your gaming traffic over normal web traffic. This way you can play games without slowing down and keep your latency low while other users on the network browse the web and download files.

- What if you are running a web server and you find the majority of your data is text based and is less than 10KB per page, but you do have a few larger data files around 5MB. You decide you want to serve out data quickly in the beginning of the connection and slow down after a few seconds. This is called a nonlinear service curve (NLSC or just SC). You can setup HFSC to serve out the first few seconds of a connection at full speed, lets say 100KB/sec and then slow the connection down after 5 seconds to 25KB/sec. This allows you to serve out your html page at full speed and still allow people to download the 5MB files at slow speed, saving bandwidth for other web clients.

Quality of Service gives you the tools you need to shape traffic.

NOTE: Calomel.org also offers a OpenBSD Pf Firewall "how to" ( pf.conf ) if you need it. We cover this HFSC scheduler example integrated into a working pf.conf rule set.

Getting Started (HFSC basics)

Lets take a look at the basic set of commands in HFSC and why you would uses them in the real world:

bandwidth: has two(2) slightly different meanings depending on if it is defined in the parent or child queue lines. Make note not to confuse the two.

In the parent "altq" line this defines the maximum bit rate for all queues on an interface. The total aggregate upload bandwidth allowed by the ISP not the negotiated speed of the NIC. It is important to specify a value slightly less than the maximum amount of upload bandwidth available so altq can queue the data and not some upstream router (96% of the upload speed of 1000kbit/sec as in our example).

In the child "queue" line(s) this directive specifies the maximum bit rate to be processed by the queue at any one time. This directive is actually the same as using the "linkshare" with the a value in (m2). This value must not exceed the value of the parent queue and can be specified as an absolute value or a percentage of the parent queue's bandwidth. If not specified, this defaults to 100% of the parent queue's bandwidth. It is advisable to assign a percentage of your total bandwidth to each child queue up to a total of no more than 100%.

priority: the level specifies the order in which a service is to occur relative to other queues. The higher the number or value, the higher the priority. This directive is a simple way of specifying which packets are first out of the gate compared to others. Priority is does _not_ define an amount of bandwidth, but the order in which packets are buffered before being set out of the interface. For example, if you have gaming data and bulk web data. You want gaming data to be first since it is interactive and bulk web traffic can wait. Set the gaming data queue at least one(1) priority level higher than the bulk web traffic queue. The priority range for HFSC and cbq is 0 to 7 and for priq the range is 0 to 15. Priority 0 is the lowest priority for the least important data. When not specified, a default of 1 is used. Priq type queues with a higher priority are always served first. Cbq and HFSC type queues with a higher priority are served first if the link is saturated and the "realtime" bandwidth is also exhausted.

qlimit: the amount of "slots" available to a queue to save outgoing packets when the amount of available bandwidth has been exceeded. This value is 50 by default. When the total amount of bandwidth has been reached on the outgoing interface or higher queues are taking up all of the bandwidth then no more data can be sent. The qlimit will put the packets the queue can not send out into slots in memory in the order that they arrive. When bandwidth is available the qlimit slots will be emptied in the order they arrived; first in, first out (FIFO). If the qlimit reaches the maximum value of qlimit, the packets will be dropped. Look at qlimit slots as "emergency use only," but as a better alternative to dropping the packets out right. Also, do not think that setting the qlimit really high will solve the problem of bandwidth starvation and packet drops. What you want to do is setup a queue with the proper bandwidth boundaries so that packets only go into the qlimit slots for a short time (no more than a few seconds), if ever.

realtime: the amount of bandwidth that is guaranteed to the queue no matter what any other queue needs. Realtime can be set from 0% to 80% of total connection bandwidth. Lets say you want to make sure that your web server gets 25KB/sec of bandwidth no matter what. Setting the realtime value will give the web server queue the bandwidth it needs even if other queues want to share its bandwidth.

upperlimit: the amount of bandwidth the queue can _never_ exceed. For example, say you want to setup a new mail server and you want to make sure that the server never takes up more than 50% of your available bandwidth. Or lets say you have a p2p user you need the limit. Using the upperlimit value will keep them from abusing the connection.

linkshare (m2): this value has the exact same use as "bandwidth" above. If you decide to use both "bandwidth" and "linkshare" in the same rule, pf (OpenBSD) will override the bandwidth directive and use "linkshare m2". This may cause more confusion than it is worth especially if you have two different settings in each. For this reason we are not going to use linkshare in our rules. The only reason you may want to use linkshare _instead of_ bandwidth is if you want to enable a nonlinear service curve.

nonlinear service curve (NLSC or just SC): The directives realtime, upperlimit and linkshare can all take advantage of a NLSC. In our example below we will use this option on our "web" queue. The format for service curve specifications is (m1, d, m2). m2 controls the bandwidth assigned to the queue. m1 and d are optional and can be used to control the initial bandwidth assignment. For the first d milliseconds the queue gets the bandwidth given as m1, after wards the value given in m2.

default: the default queue. As data connections or rules that are specifically put into a queue will be put into this queue rule. This directive must be in one rule. You can _not_ have two(2) default directives in any two(2) rules.

The HFSC queue format

Now, lets take a look at a custom HFSC queue setup. The following group of rules splits data into 8 subsets and gives each one of them specific data tasks and limits. You do not have to follow this example exactly, especially since you have the definitions above. Lets go through what each line does and why it is used, then you can decide for yourself.

cut-and-paste this set if you want. It works perfectly fine.

# Comcast Upload = 1000Kb/s (queue at 96%)
 altq on $ExtIf bandwidth 960Kb hfsc queue { ack, dns, ssh, web, mail, bulk, bittor, spamd }
  queue ack        bandwidth 30% priority 8 qlimit 500 hfsc (realtime   20%)
  queue dns        bandwidth  5% priority 7 qlimit 500 hfsc (realtime    5%)
  queue ssh        bandwidth 20% priority 6 qlimit 500 hfsc (realtime   20%) {ssh_login, ssh_bulk}
   queue ssh_login bandwidth 50% priority 6 qlimit 500 hfsc
   queue ssh_bulk  bandwidth 50% priority 5 qlimit 500 hfsc
  queue bulk       bandwidth 20% priority 5 qlimit 500 hfsc (realtime   20% default)
  queue web        bandwidth  5% priority 4 qlimit 500 hfsc (realtime  (10%, 2000, 5%) )
  queue mail       bandwidth  5% priority 3 qlimit 500 hfsc (realtime    5%)
  queue bittor     bandwidth  1% priority 2 qlimit 500 hfsc (upperlimit 98%)
  queue spamd      bandwidth  1% priority 1 qlimit 500 hfsc (upperlimit 1Kb)

Definitions: step by step

1. #Comcast Upload = 1000Kb/s (queue at 96%)

The first line is simply a comment. It reminds one that comcast's total upload bandwidth is 1000Kb/s (kilobits per second). You never want to use exactly the total upload speed, but a few kilobytes less. On comcast 96% works very well.

Why? You want to use your queue as the limiting factor in the connection. When you send out data and you saturate your link the router you connect to will decide what packets go first and that is what we want HSFC to do. You can _not_ trust your upstream router to queue packets correctly.

So, we limit the upload speed to just under the total available bandwidth. "Doesn't that waste some bandwidth then?" Yes, in this example we are not using 5KB/s, but remember we are making sure the upstream routers sends out the packets in the order we want, not what they decide. This makes all the difference with ACK packets and will actually increase the available bandwidth on a saturated connections.

2. altq on $ExtIf bandwidth 960Kb hfsc queue { ack, dns, ssh, web, mail, bulk, bittor, spamd }

The second line is the parent queue for the external interface ($ExtIf), it shows we are using "hfsc queue" and lists out all eight(8) of the child queues (ack, dns, ssh, web, mail, bulk, bittor, spamd). This is where we specify the bandwidth limit at 96% of the total 10000Kb = 960Kb.

The next set of lines specify the eight(8) child queues and also two sub-child queues in the ssh rule. All of these rules use the external interface and are limited by the parent queue's bandwidth limitations.

REMEMBER: Do not set your upload bandwidth too high otherwise the queue in pf will be useless. A safe rule is to set the maximum bandwidth at around 96% of the total upload speed available to you. Setting your max speed lower is preferable to setting it too high.

3. queue ack bandwidth 30% priority 8 qlimit 500 hfsc (realtime 20%)

This is the ack queue. it can processes as much as 30% out of the total link bandwidth, it is the highest priority at 8, and has a very high queue limit of 500 slots. The realtime of 20% means this queue is guarantee at least 20% of the total bandwidth no matter what any other rules wants.

The highest priority queue is for ack (acknowledge) packets. Ack packets are the method your system tells the remote servers you have received the payload they sent and to send the next one. By prioritizing these packets you can keep your transfer rates high even on a highly saturated link. For example, if you are downloading a file and you receive a chunk of data the remote system will not send you the next chunk of data until you send them an OK. The OK is the ack packet. When you send the ack packet the remote system knows you got the packet and it has checked out, thus it will send the next one. If on the other hand you delay ack packets, the transfer rate will diminish quickly because the remote system wont send anything new until you respond.

4. queue dns bandwidth 5% priority 7 qlimit 500 hfsc (realtime 5%)

This is the dns queue. it can is allowed to processes as much as 5% out of the total bandwidth, it is the second highest priority at 7 and has a high queue limit of 500 slots. The realtime of 5% means this queue is guarantee at least 5% of the total bandwidth no matter what any other rules wants.

This queue is simply to make sure dns packets get out on time. Though this is not really necessary your web browsing users will be thankful. When you go to a site or enter a URL the clients need the ip of the server. This rule simply allows dns queries to go out before other traffic.

5. queue ssh bandwidth 20% priority 6 qlimit 500 hfsc (realtime 20%) {ssh_login, ssh_bulk}
queue ssh_login bandwidth 50% priority 6 qlimit 500 hfsc
queue ssh_bulk bandwidth 50% priority 5 qlimit 500 hfsc

This is the ssh parent and child queues. The parent queue can processes as much as 20% out of the total bandwidth, it is at priority at 6, and has a very high queue limit of 500 slots. The realtime of 20% means this queue is guarantee at least 20% of the total bandwidth.

The two(2) child queues are for ssh's interactive logins (ssh_login) and bulk transfer data like scp/sftp (ssh_bulk). These two queues are under the parent queue and both divide the parents bandwidth of 20% of the total aggregate nic bandwidth. In this example we want to make sure interactive ssh like authentication has at least 50% of the bandwidth. The rest of the bandwidth is used for bulk transfers like scp and sftp transfers. Both child queues do have the ability to share bandwidth from each other. The priorities of the ssh child queues are independent of all of the other queues. We could have picked any other priorities as long at ssh_login was higher than ssh_bulk.

Normally only one queue name is given with the queue keyword, but if a second name is specified that queue will be used for packets with a Type of Service (ToS) of low-delay (tos 0x10) and for TCP ACK packets with no data payload. A good example of this is found when using SSH. SSH login sessions will set the ToS to low-delay while SCP and SFTP sessions will not. PF can use this information to queue packets belonging to a login connection in a different queue than non-login connections. This can be useful to prioritize login connection packets over file transfer packets.

REMEMBER: when setting up your pass rules for ssh traffic you need to have the two queues in the correct order. For ssh, the first queue listed is for bulk traffic and the second is for interactive traffic. For example, "queue (ssh_bulk, ssh_login)" is the correct order for your pass rules. Check our OpenBSD Pf Firewall "how to" (pf.conf) Guide for a working example.

6. queue bulk bandwidth 20% priority 5 qlimit 500 hfsc (realtime 20% default)

This is the bulk queue. The bulk queue can processes as much at 20% out of the total bandwidth, it is at priority at 5, and has a very high queue limit of 500 slots. The realtime of 20% means this queue is guaranteed at least 20% of the total bandwidth no matter what any other rules wants.

This queue is the where all of the general traffic will go. If one does not specify a queue for a rule, that traffic will go here. Notice the directive "default" after the realtime tag. You must specify one and only one "default" queue.

7. queue web bandwidth 5% priority 4 qlimit 500 hfsc (realtime (10%, 2000, 5%) )

This queue is an example showing the use on nonlinear service curve ( nlsc or just sc) with the realtime directive. We could assign this queue to the traffic coming into the external network interface and accessing our public web server.

In this example we are using three(3) variables to shape the bandwidth over time. The format for service curve specifications is (m1, d, m2). m2 controls the bandwidth assigned to the queue. This is what we used for all of the previous realitme variables. m1 and d are optional and can be used to control the initial bandwidth assignment. For the first d milliseconds the queue gets the bandwidth given as m1, after wards the value given in m2.

So, our web queue will guarantee bandwidth up to 10% of the parent queue (1000/10 or 100 kbits/sec) for at least 2000 milliseconds (2 seconds) after the transfer starts. Then after 2000 milliseconds the bandwidth will go down to 5%. This might be useful to keep short interactive transfers fast, but slow down big downloads which might otherwise monopolize your bandwidth.

8. queue mail bandwidth 5% priority 3 qlimit 500 hfsc (realtime 5%)

This queue can be used for incoming mail server connections once they have passed your spamd checks. It can borrow as much as 5% of the total bandwidth and has a high queue limit of 500 slots. The realtime of 5% means this queue is guarantee at least 5% of the total bandwidth no matter what any other rules wants.

9. queue bittor bandwidth 1% priority 2 qlimit 500 hfsc (upperlimit 98%)

This is bittor queue. The bittor queue can borrow as much at 1% of the total bandwidth, it is at priority at 2, and has a very high queue limit of 500 slots. Notice this rule does not have a real time directive. This is because we have decided that bittor traffic is expendable and we want to make sure this queue gives up all bandwidth to higher priority queues need it. The upperlimit directive makes sure this rule will never borrow more than 98% of the total bandwidth from any other queue.

This rule is here to show that one can use peer sharing tools and still have control of their network. You will notice that remote clients using peer 2 peer sharing tools connecting will hammer your connection. This rule will allow the data to transfer at up to 98% of your full speed, but if another queue needs the bandwidth, the bittor queue will be pruned almost instantly to 1%. Imagine if you are getting the latest OpenBSD distro through a torrent and then you want to browse the web. Normally, you would experience a slow connection because you are fighting for bandwidth. With this rule your browsing traffic gets the bandwidth it needs instantly. The bittor queue on the other hand gets reduced and starts using the qlimit slots until you are done using the bandwidth browsing. Best of both worlds.

10. queue spamd bandwidth 1% priority 1 qlimit 500 hfsc (upperlimit 1Kb)

This is the spamd queue. The spamd queue can process as much as 1% of the total parent bandwidth, it is at lowest priority of 1, and has a very high queue limit of 500 slots. Notice this rule does not have a real time directive. This is because we have decided that spamd traffic is expendable and we want to make sure this queue gives up all bandwidth is higher priority queues need it. The upperlimit directive makes sure this rule will _never_ borrow more than 1% of the total bandwidth from any other queue.

This rule is used for spammers and is linked to the spamd daemon to annoy them. Since the traffic on this queue has very low traffic requirements we have decided to set the upper and lower bounds at 1% of the total bandwidth. Even with 100 spammers connected less than 1KB/sec is more than enough to annoy them. Even if you had more spammers connected the queue would never use more than 1% of the bandwidth. Any extra packets would go into qlimit and if that fills would then packets would be dropped. No problem since the data in expendable.

Applying the rules - OpenBSD Pf Firewall "how to" (pf.conf) Guide

Now that we have taken a detailed look at the queue rules and directives, we now need to look at a way to apply those queues to our pf rules.

Here we have two(2) examples of rules you can use queuing on. Notice the queue names we used above like ack, bulk, ssh_login, and ssh_bulk at the end of the rules. Also, notice the order that we have put the two queues in on each rule. The first queue name in "bulk, ack" is for general data and the second "ack" is for special short length packets (TOS).

pass out on $ExtIf inet proto tcp from ($ExtIf) to any flags S/SA modulate state queue (bulk, ack)
pass out on $ExtIf inet proto tcp from ($ExtIf) to any port ssh flags S/SA modulate state queue (ssh_bulk, ssh_login)

The first rule is passing out bulk traffic on the external interface and prioritizing ack packets. The second rule is passing out data on port 22(ssh) and prioritizing the interactive ssh traffic. This traffic is originating on our internal network or on the firewall itself.

If we decided to have a rule with only one queue directive it would look like so.

pass out on $ExtIf inet proto tcp from ($ExtIf) to any flags S/SA modulate state queue (bulk)

You can also queue data on the return trip on an external stateful connection. Remember you can _not_ queue data coming into the box, only going out. Lets say you have a web server and clients from the outside connect to you and you want their data responses to be queued. The following works perfectly.

pass in on $ExtIf inet proto tcp from any to ($ExtIf) port www flags S/SA modulate state queue (web, ack)

For more information or ideas about on CARP (Common Address Redundancy Protocol) check out our OpenBSD Pf / CARP Firewall Failover page.

Verifying the queues

So, now you have read all about queuing and you have applied the queue tags to your rules. Now you need to verify that what you setup works actually does what you thought it should do. You should first install "pftop" from the OpenBSD package collection if you are using OpenBSD v4.3 or earlier. If you are using OpenBSD 4.4 or later use the command "systat queues".

The following is an example output from "systat queues".

pfTop: Up Queue 1-9/9, View: queue, Cache: 10000

  QUEUE          BW SCH  PRIO     PKTS    BYTES   DROP_P   DROP_B QLEN BORROW SUSPEN     P/S     B/S
root_rl0       960K hfsc    0        0        0        0        0    0                     0       0
 ack           595K hfsc    8      950     1345        0        0    0                     0       0
 dns          52080 hfsc    7       24      734        0        0    0                     0       0
 ssh          74400 hfsc    6        0        0        0        0    0                     0       0
  ssh_login   66960 hfsc    6       83    13538        0        0    0                   0.2      26
  ssh_bulk     7440 hfsc    5       11     3042        0        0    0                     0       0
 bulk          7440 hfsc    5      406    44540        0        0    0                    80     403
 web           7440 hfsc    4    99406  564454K        0        0    0                   280   34403
 mail          7440 hfsc    2     6409   83402K        0        0    0                     0       0
 bittor        7440 hfsc    2        0        0        0        0    0                     0       0
 spamd         7440 hfsc    1    24424  1412491        0        0  140                    15     923

The output above is similar to what you are looking for. You need to test each type of queue you setup to make sure you see the packets being added to the correct queue. For example, you could ssh to another machine going out the external interface and as you do so you should see interactive packets (like typing) being add to the "ssh_login" queue. If you scp/sftp a file you should packets being added to the "ssh_bulk" queue.

Want more speed? Make sure to also check out the Network Speed and Performance Guide. With a little time and understanding you could easily double your firewall's throughput.

Questions?

Do you have a "how to" about PF ?

Yes, we sure do. Check out our OpenBSD Pf Firewall "how to" ( pf.conf ) which covers PF and includes this HFSC quality of service (QOS) into the config.

Can you tell me how Comcast throttles its users on their network?

Clarifying Misconceptions of the New Comcast Congestion Mgmt System

I wanted to try to clear up a misconception about how the new Comcast congestion management system works. I believe we have both heard people complain that they fear that they will be unable to use their provisioned speeds during off-peak hours, for example, or at all times of the day, or that users are somehow throttled to a set speed. Neither of these two things are correct.

Part of the problem appears to be confusion over how a user's traffic enters a lower priority QoS state, so I hope to clarify that here. In order for any traffic to be placed in a lower priority state, there must first be relatively high utilization on a given CMTS port. A CMTS port is an upstream or downstream link, or interface, on the CMTS in our network. The CMTS is basically an access network router, with HFC interfaces on the subscriber side, and GigE interfaces on the WAN/Internet side. Today, on average, about 275 cable modems share the same downstream port, and about 100 cable modems share the same upstream port (see page 5 of Attachment B of our Future Practices filing with the FCC, available at http://downloads.comcast.net/docs/Attachment_B_Future_Practices.pdf .

We define a utilization threshold for downstream and upstream separately. For downstream traffic, a port must average over 80% utilization for 15 minutes or more. For upstream traffic, a port must average over 70% utilization for 15 minutes or more. When one of these threshold conditions has been met, we consider that individual port (not all ports on the CMTS) to be in a so-called "Near Congestion State". This simply means that the pattern of usage is predictive of that network port approaching a point of high utilization, where congestion could soon occur. Then, and only then, do we search the most recent 15 minutes of user traffic on that specific port, in order to determine if a user has consumed more that 70% of their provisioned speed for greater than 15 minutes.

By provisioned speed, we mean the "up to" or "burst to" speed of their service tier. This is typically something like (1) 8Mbps downstream / 2Mbps upstream or (2) 6Mbps downstream / 1Mbps upstream. So how does this work in action?

Let's say that a downstream port has been at 85% utilization for more than 15 minutes. That specific downstream port is identified as being in a Near Congestion State since it exceeded an average of 80% over that time. We then look at the downstream usage of the ~275 cable modems using that downstream port. That port has a mix of users that have been provisioned either 8Mbps or 6Mbps, so 70% of their provisioned speed would be either 5.6Mbps or 4.2Mbps, respectively. So let's use the example of a user with 8Mbps/2Mbps service on this port. In order for their traffic to be marked with a lower priority on this downstream port, they must be consuming 5.6Mbps in the downstream direction for 15 minutes or more, while said port is highly utilized. Once that condition has been met, that user's downstream traffic is now tagged with the lower priority QoS level. This will have *no* effect whatsoever on the traffic of that user, until such time as an actual congestion moment subsequently occurs (IF it even occurs). Should congestion subsequently occur, traffic with a higher priority is handled first, followed by lower priority (and this is not a throttle to X speed).

Questions, comments, or suggestions? Contact Calomel.org