Webserver Optimization and Bandwidth Saving Tips



Home


Running a webserver can be a rewarding experience and also a trial in patience. You want to serve out all your pages and pictures, but you only have a finite amount of bandwidth to do so. if you overload your connection client visiting your server will think it is slow and unresponsive. You need to setup your server in the most efficient way possible to get the most visits you can and give your visitors a positive experience. The following are tips on reducing the load on your webserver, speeding up the serving pages and stopping unwanted and abusive traffic.



Data Compression

Data compression is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes. Compressing web pages and other data before bing sent to the client can save your server up to 80% of your current bandwidth.

Apache and lighttpd both have the mod_compress module. The newer version of Apache v2.2 also has mod_deflate to compress data on the fly. This allows you to specify the types of files you want your server to compress. For example text, HTML and css pages compress as much as 90%. Jpg's on the other hand are already highly compressed and you will actually make jpgs larger compared to just serving them out as is. By compressing your pages you can save bandwidth and make your page load faster on client machines due to the lower transmission times.

The web server use mod_compress will only compress a page when it is asked for at least once. When the page is requested the server will compress the page and save it in a storage area on the server. The page is then sent to the client in compressed form. The original compressed file is kept on the server just in case anyone else requests it. Very efficient.

The draw back of any type of compression is the time it takes for the server to encrypt the data and for the receiving client to de-crypt it. For a standard webserver today the CPU hit of compressing pages before serving them out is minimal. Perhaps as little as a few seconds to compress the pages of an entire site.

Typical savings on compressed text files range from 60% to 85%, depending on how redundant the code is. Some JavaScript files can actually be compressed by over 90%. Webmasters who have deployed HTTP compression on their servers report savings of 30 to 50% off of their bandwidth bills. Compressed content also speeds up your site by requiring smaller downloads. The cost of decompressing compressed content is small compared to the cost of downloading uncompressed files. On narrow band connections with faster computers CPU speed trumps bandwidth every time.

Take a look at implementing mod_compress or mod_deflate on your webserver.



Expires headers - Caching

The Expires HTTP header is the basic means of controlling caches; it tells all caches how long the object is fresh for; after that time, caches will always check back with the origin server to see if a document is changed. Expires headers are supported by practically every client. You can see the "Expires" date of this page by looking at the "page info" section of your browser.

Caching is the temporary storage of frequently accessed data in higher speed media, like ram or local disk, for more efficient retrieval. Web caching stores frequently used objects closer to the client through browser, proxy, or server caches. By storing objects closer to your users, you avoid round trips to the origin server, greatly reducing bandwidth consumption, server load, and most importantly, latency. Cache pages "feel" much faster to the client because they load faster.

Caching is not just for static sites, even dynamic sites can benefit from caching. Graphics and multimedia typically don't change as frequently as HTML files. Graphics that seldom change like logos, headers, and navigation can be given longer expiration times while resources that change more frequently like XHTML and XML files can be given shorter expiration times. By designing your site with caching in mind, you can target different classes of resources to give them different expiration times with only a few lines of code.

Most Web servers allow you to set Expires response headers in a number of ways. Commonly, they will allow setting an absolute time to expire, a time based on the last time that the client saw the object (last access time), or a time based on the last time the document changed on your server (last modification time).

Expires headers are especially good for making static images (like navigation bars and buttons) cache-able. Because they don't change much, you can set extremely long expiry time on them, making your site appear much more responsive to your users. They're also useful for controlling caching of a page that is regularly changed. For instance, if you update a rss page once every 4 hours, you can set the object to expire at that time, so caches will know when to get a fresh copy, without users having to hit 'reload'. Here is an example of the expires tag in Apache. Check on the Calomel.org Home Page for more information about setting up web servers like Lighttpd, Nginx and Apache.

# Expires mod
ExpiresActive On
ExpiresDefault "access plus 4 hours"



The If-Modified-Since request header

Most search bots (Google, Yahoo, and MSN) and some clients will ask your server if the page has changed since its last visit. If it has not, then your server will return a 304 (not modified) code with no body. If your page has changed, your server will send the page like normal.

The purpose of this feature is to allow efficient updates of cached information with a minimum of transaction overhead.

A GET method with an If-Modified-Since header and no Range header requests that the identified page be transferred only if it has been modified since the date given by the If-Modified-Since header. The algorithm for determining this includes the following cases:



Picture type and quality

Photos and graphics use significantly more bandwidth than HTML text, so make sure you optimize (compress) images down to the smallest possible file size. Save images no higher than 72 dpi and choose the right format for the image. Make your photos as physically small as they need to be to convey the intended message. Also, whenever possible avoid serving images if HTML or CSS can be a valid solution.

JPG images are highly compressed and efficient to send out to clients. You can still edit the pictures and change the quality. In gimp and photoshop for example, you can save a pic to JPG format and edit the image quality level from 0 to 100% of the original quality. Test your pics and see if you can reduce the quality and keep the image looking good. On most images you can reduce the quality level down to as low as 50% with out seeing a huge degradation in the image. Here at calomel.org we have been able to reduce our pics size by 60% using this simple method. Take a look at the pic at the top of this page. It is a JPG that was 27 kilobytes and we reduced the quality down 60%. The picture is now 8 kilobytes.



The Cursed favicon.ico

The favicon.ico is the little pic at the left of the URL on the address bar of the browser. The original favicon "feature" turned out to be a curse and was created by Microsoft. Internet Explorer web browser would request a favicon.ico from a set URL path (/favicon.ico) on every website. In the spirit of making all of your pics small, you need to make the favicon.ico _really_ small. Try to make it no larger than a single layer, 4 color, 16x16 pixel image. If you do not want to make your own then search on Google for examples. There are plenty of pages where you can see other people's designs and download them for free. Use any premade .ico file if it is small and suites your needs.

When serving out pages the .ico file can be compressed by your web server. Doing this can reduce its size by 60%. Also, it is sent out with every page a client asks for, infact some clients actually ask for it twice for no reason. The favicon.ico does not have to be referred to by a link on your page; it is always inferred that the file exists. Thus, you can not limit access to the .ico file by referrer value.

To reduce bandwidth we suggest compressing AND making the favicon.ico less than 512 bytes and no more than 1.1 kilobytes in order to send it inside of one TCP packets. A single TCP packet is 1500 bytes and can host a 1460 byte data payload on most Os's. Make sure the favicon.ico can fit into one TCP packet by making it less than 1460 bytes. There is also nothing wrong with serving a 1x1 pixel empty image or returning an error 204 (No Content) if you decide you do not want to waste bandwidth on the favicon.ico.

You may also want to specify the name of your favicon file. Many bad clients and scanners will assume there is an ico file in the default location /favicon.ico. These clients are notorius for wasting bandwidth. By using the following code in your "head" section at the top of your pages you can tell clients where to access the picture you want used as a favicon.ico.

<link rel="shortcut icon" type="image/x-icon" href="some_favicon.ico" />

Lastly, make sure you have the smallest .ico file you can live with and watch your access logs. We have heard of servers who see 30% of their bandwidth dedicated to just serving out the favicon.ico file.



Reduce the amount of connections the client makes

Every object you have on your page will require at least one request thread from the client. If you have 3 pics, the main HTML page and a css file then you are serving out 5 objects and the client will try to make 5 connections to you. If you reduce the amount of objects on your page the client will load faster and not make as many requests to the server.

Remember, if you also serve out advertisement (ad) references from you page the client will make one connection for every ad. If the ad server is slow this will make your page look a lot slower than what your server can send out. In fact, most clients will load the objects in the order of they are listed in the main HTML file. If you have 4 ads at the top of the page the entire page may look like it has stalled because the ad server has not sent the data. According to the client, it is not the AD server that looks sluggish, but your site. If your site seems slow the client will look else ware for the answers they seek and avoid your host name.

Can you suggest a good web page optimization reporting site?

We highly suggest taking a look at PageTest - Web Page Optimization and Performance Test. It is free and will analyze a page and show graphs pertaining to your sites performance and speed of delivery.



Use KeepAlives, but not for too long (5 second timeout)

Keepalives are a persistent connection between a client browser and a server. Originally, HTTP was envisioned as being stateless. Prior to keepalive, every image, javascript, frame, etc. on your site was requested using a separate connection to the server. When keepalives started wide spread use in HTTP/1.1, web browsers were allowed to keep a connection to a server open, in order to transfer multiple files across that same connection. Fewer connections, less overhead, more performance. There are some problems though, Apache and Lighttpd by default keep the connections open for too long. The default is around 30 seconds, but you can get by easily with 5 seconds.

Keep alive limits tell the server when a browser stops requesting files, wait for X seconds before terminating the connection. If you are on a decent connection, 5 seconds is plenty of time to wait for the browser to make additional requests. The only reason you would want to set a higher KeepAliveTimeout is to keep a connection open for the NEXT page request. That is, user downloads a page, renders it completely, and then clicks another link within the keep alive timeout period.

A timeout of 30 would be appropriate for a site that has people clicking from page to page often. If you are running a low volume site where people click, read for a while and click you can set a short timeout. With keep alive limits you are essentially taking 1 or more webserver processes and saying, for the next X seconds, do not listen to anyone but this one client, who may or may not actually ask for anything. For each keep alive processes sitting idle you are using a file descriptor (open file process) that could be used for a new client. The server is optimizing one case at the expense of all the other people who are hopefully hitting your site.



Enable pipelining on the server and the clients

HTTP pipelining is a technique in which multiple HTTP requests are written out to a single socket without waiting for the corresponding responses. Pipelining is only supported in HTTP/1.1, not in 1.0. The pipelining of requests results in a dramatic improvement in page loading times, especially over high latency connections such as satellite Internet connections. Since it is usually possible to fit several HTTP requests in the same TCP packet, HTTP pipelining allows fewer TCP packets to be sent over the network, reducing network load. HTTP pipelining requires both the client and the server to support it. HTTP/1.1 conforming servers are required to support pipelining. This does not mean that servers are required to pipeline responses, but that they are required not to fail if a client chooses to pipeline requests.

Web servers like Apache and Nginx already offer support by default so you do not need to do anything to take advantage of pipelining. Exceptions include IIS 4 and reportedly 5. Also, take a look at the dicussion from die.net called, Optimizing Page Load Time.

The fastest and most efficient way to implement a browser is to use pipelining. This is where a single persistent connection is used, but instead of waiting for each response before sending the next request, several requests are sent out at a time. This reduces the amount of time the client and server are waiting for requests or responses to cross the network. Pipelined requests with a single connection are faster than multiple HTTP/1.0 requests in parallel, and considerably reduce the number of packets transmitted across the network. Apache supports both HTTP/1.0 keep-alives and HTTP/1.1 persistent connections. Pipelining is implement entirely at the browser end, using persistent connections.

To enable pipelining in Firefox browser goto the url about:config . Then search for "pipe" and set "network.http.pipelining" and "network.http.proxy.pipelining" to true. Also increase "network.http.pipelining.maxrequests" from 4 to 8.

Implementation in web browsers: Internet Explorer as of version 7 doesn't support pipelining. Mozilla Firefox 2.0 supports pipelining, but it's disabled by default. It uses some heuristics, especially to turn pipelining off for IIS servers. Instructions for enabling pipelining can be found at Firefox Help: Tips & Tricks. Camino does the same thing as Firefox. Konqueror 2.0 supports pipelining, but it's disabled by default. Instructions for enabling it can be found at Konqueror: Tips & Tricks. Opera has pipelining enabled by default. It uses heuristics to control the level of pipelining employed depending on the connected server.



Watch the bots

Bots can make a big impact on the amount of data your server sends out. Use the robots.txt file you disallow pages or directories you do not want indexed. For example, you can disallow the google image bot from looking at any of your pictures. Take a look at the Robots.txt "how to" for more information about the robots.txt file.

All of your pages have a "modified" date and time. This is normally the time the HTML page was last edited. The web server will use this time and notify the clients when the page was last changed. You can see the "modified" time in your browser by checking out the "page info" section of your browser. When a bot checks this information and finds it has not changed since the last time it check the page, then it will not index the page again. On the other hand, if you did want to make sure that all bots look at all of your pages again you could "touch" all of the HTML files with the current time/date.



Stop image-hijacking

There will come a time when you look at your access logs or traffic graphs and you see that a picture, or set of pictures are responsible for most of your bandwidth; using well over the normal amount of bandwidth you expect these pics to use. You may also notice the pages the pics are on are not being served up as many times as the pics are.

What you may be seeing is image-hijacking. This is the practice of other people linking to your images from other sites. They might be using a picture as a signature on a blog or forum or as a a piece of work they are taking credit for. They do not care were they linked it from because they do not supply the bandwidth.

You need to have a plan in place when this happens. The easiest way to stop this activity is to change the name of the pic so it can not served out. You could also replace the link to the pic with a link to an ad. How about changing the pic to point to another one which the admin of the aforementioned site will find objectionable? Your goal is to stop this unwanted traffic load on your site's server so use your imagination and keep a few emergency pics around.

When you make a picture for your site use something to make sure people know where the image is from. This will discourage users from linking to it. With the gimp and photoshop you could place a watermark on the picture. This could be a faint image of your site logo or the URL. Also, watch the dimensions for the pictures you make. The most commonly stolen pics are small to medium size square pics. Use images that look good on your site, but will probably not fit well into a signature box on a forum or on someone else's site.

You may also choose to use the rewrite engine in your webserver. Lets use Apache as an example. The first line tells Apache to use the rewrite engine. The second and third say that if no referrer is given, or if the user is not referred by your site, refer to the fourth line and substitute the file error.gif for any GIF or JPG file. In la mens terms, if a person did not type in the pic's URL or reach it from your site, Apache will automatically put in a specific image of your choosing. For more information about Apache check out the Apache "how to".

#### stop image hijacking (apache httpd.conf)
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://www.calomel.org/.*$ [NC]
RewriteRule .*\.(gif|GIF|jpg|JPG)$ http://www.calomle.org/error.gif [R]

In Lighttpd (lighty) you can stop image hijacking by using the following line in your lighttpd.conf file. For more information about Lighttpd check out the Lighttpd "how to"

#### stop image hijacking (lighttpd.conf) 
$HTTP["referer"] !~ "^(http://calomel\.org)" {
    url.access-deny = ( ".jpg", ".jpeg", ".png", ".avi", ".mov" )
}

In Nginx, clients can be sent an error code to stop image hijacking by using the following line in your nginx.conf file. For more information about Nginx check out the Nginx "how to"

## Only allow these file types to document root
   location / {
     if ($request_uri ~* (^\/|\.html|\.jpg|\.org|\.png|\.css|favicon\.ico|robots\.txt)$ ) {
       break;
     }
     return 444;
   }



Submit a Sitemap

A SiteMap is a listing of URLs you want a search engine to know about. The reason you would want to make a list of pages of your site, instead of just letting the search engine find the pages itself, is efficiency. If you tell them about your site they will not have to guess what you want the world to see. A bot will try to follow any link they see on your site. If you specify exactly what pages you want indexed, the bot will not have to try every link to see what works. This reduces the time it takes for your pages to be indexed and reduces the amount of wasted bandwidth. Once you register your sitemap with a search engine their bots will only look at the pages listed in the sitemap.

First you must make a sitemap file. The sitemap file can be either in xml or text (txt) format. Google accepts xml or txt and Yahoo accepts only txt. For simplicity we suggest making one text (txt) file accessible to both search engines. Other search engines like MSN and ASK do not accept sitemaps, but that is not really a problem as Google and Yahoo are the two most highly trafficked sites in the world.

For a simple site with a straight forward web tree the best approach may be a sitemap in text format. All you need to do is create a text file with any name, like sitemap.txt in your web tree with a listing of the URLs you want the search engine to index. Something like the following:

http://your_site.com/
http://your_site.com/my_sutff.html
http://your_site.com/friends.html
http://your_site.com/software/your_stuff.html

If you have a more complicated web tree or a site that is dynamically created than an xml sitemap may be the best approach. There are many tools that will help you create an xml file. For example, XML Sitemaps has a free online utility to create the sitemap.xml file for you through the browser. Google also offers xml scripts you can locally run on your systems for auto generated xml sitemaps.

Lastly, to tell Google and Yahoo about your sitemap file you must goto their respective sites and register for a webmaster account. Once logged in they will step you though the sitemap verification process. Here are the links for the Google Webmasters page and the Yahoo Site Explorer page.



Trim the HTML

If your website receives a high number of hits, even the smallest reduction in your HTML document's size will mean a substantial reduction in bandwidth usage. Here are a few tips and tricks:





Use a program to strip out whitespace (spaces, tabs, newlines, etc.)

White space in HTML code is normally characterized by multiple spaces, tabs (\t), returns (\r) and newline (\n) characters. These are usually kept in HTML code so a human can read the source better to edit it. The problem is the remote HTML browser does _not_ need these characters for the page to be rendered correctly. Look for a whitespace stripper for your HTML pages. It could be built into the web server or a third party script (Perl, PHP, etc.). All major sites like Google, Slashdot and Digg use some sort of white space stripper to save bandwidth.

How much bandwidth can you expect to save by using a whitespace stripper? On average you could save 2% of the total size of the HTML pages written by hand or previously unoptimized. If your average page size is 100 kilobytes you could save around 2 kilobytes every time they were served. If you served a hundred thousand pages per month you would reduce your bandwidth usage by 200 megabytes per month. That is 200 MB/month of wasted bandwidth to send characters remote clients do not use or ever see. When you are paying for bandwidth, every bit makes a difference.





Banner Ads are simply bad content

Using ads on your site are not considered good content. In fact most users have ways of stopping the ads from displaying (turning off javascript or denying access to ad serving host names) and most others simply ignore them. Ads should not be expected to pay for a sites expenses and, it seems, ads are become more useless to users. They have been burned too many times by bad ad links or deceptive advertising schemes. It is simply not worth a user's time.

If you decide to put ads on your site then make sure to set them up correctly. Use a more trustworthy ad serving company like Google's Adsense and make the ads fit into your site. Make sure they are sparsely placed and DO NOT interfere with your content. People come to find answers to their questions, not to see ads. They can see ads somewhere else. Never insult your visitors with a page that is anything less than 90% content.





Designing a site - the beginning

Conceptualization - Before designing a website, the first thing to do is design the concept or the reason behind the website. Why does this site need to exists if there are plenty other sites that offer similar content. It is important to get the complete understanding of the idea behind the website to do justice to the design. No website works with confusing text or content.

Easy to use - Making a website is of no use to anyone unless it is easy to access and use. Always design a website with easy navigation to help visitors understand what they are looking at. Imagine when you go to a site and get flashy designs and amazing graphics, but can not find the information you came for. Such confusing designs are bad for you, the sites designer and the site's ranking.

Search engine efficient - Every website runs only if it is recognized or is ranked by search engines, it is critically important to design a website which is easily found on search engine lists and accepted by search engine bots. Simplify your layout, try to stay to as few columns as possible. What is difficult for a human to navigate is practically impossible for a bot.

Understand your audience - Until you know your target you can not design an effective website. Ask yourself, "Who am I trying to reach?". Once you understand your target audience you can then design a site to serve their expectations. Web surfers today are knowledgeable and they have seen many other sites. Remember, a visitor has no problem closing the tab on your site if you do not meet there expectations.

Informative and simple design - It is important to take your visitor as someone who knows nothing about your site. Provide them with everything they need to find the information they came for. The average reader will take less than a second to evaluate your site. Do not have them jump through hoops to find the information they are looking for. In fact, you may have what they searched for, but if they can not find it quickly they will leave. Make a menu, a table, a listing or a search box to give them the tools they need to find the information quickly. Otherwise, they will leave. This is guaranteed.

Reduce the ads - Using ads on a site is an excellent way for the site maintainer to make money, but can be a source of confusion to the visitor. Ads can be flashy and get in the way of content. If the site has more ads per page area than content, it is loosing visitors. Ads can also make your site load a lot slower. For every ad you have on the site the visitor must make one connection to the ad server. If the ad server is slow the visitor will blame your site, not the ad server.



Want more speed? Make sure to also check out the Network Speed and Performance Guide. With a little time and understanding you could easily double your firewall's throughput.



Engineering the site - the delivery

Pages that load quickly. If your site does not load within one second or as much as five seconds, chances are that most people will simply leave. You may have speedy Internet connection, but keep in mind that not all people do. Check the size of your pictures and make sure you are not expecting people to download a 500KB jpg of your background. Make sure you know how your server acts under load as this is key to a fast loading site.

Know how much bandwidth you have. You are always going to have bandwidth constraints. Whether you host the site at home or pay a provider. Make sure you know how much you can serve and how fast. If you expect to serve up 4 gif's at 16million colors each you better make sure you have enough network capacity. Even better make your site small and efficient. Your pics should only be as big and detailed as they absolutely have to be. If you can reduce the quality of the pics even by half you may be able to reduce their size by a factor of 4. This means your site loads faster and you use less bandwidth per visit.

Text on your pages should be easy to read. You will want the size of the text to be big enough to easily read and a background color that does not obscure the text. To be on the safe side, it is recommended to use black text on a white background as 95% of web pages do. If you would like to use a little more color, choose carefully, making sure the page is still easy to read. If you have doubts ask friends, family and colleagues. They might not know web design, but they do know what they like. See what people say and accept the constructive criticism.

Easy to navigate. All text links and graphic elements such as buttons and tabs should be easy to read and use. Links found within your article should flow appropriately. You do not want your visitors leaving because they were not able to figure out how to get your site to work.

Website design and layout should be consistent. If you were to switch from one style to another to often, you will confuse your visitors. It only makes sense that if the website design is suddenly too different people are bound to think that they are at another website altogether.

Focus your web design on browser compatibility. Not all people use Internet Explorer (thankfully). Be sure that your site can be viewed using Mozilla, Safari, Opera and Firefox. If possible, test your website on both a Mac and PC. Sites targeting markets such as technology should be careful, because these readers are more likely to use some of the newer browsers. Also keep in mind that you do not have to support every browser version ever made. If someone is using a browser version made 5 years ago then they need to update because not only your site is going to break on their machine. Do not worry about really old browsers.

Website design for all screen resolutions. You may have your resolution set to 1600x1200, but remember, some people still use 800x600. A website that looks great in high resolution may not be so easy to view correctly in 800x600. Try a few default resolutions and see how your site looks. Then pick a style that looks good. Also, do not put a tag that says "this site looks best at X" as people won't ever change their screen resolution to look at one web page.



Engineering the site - maintenance

Relative links. If you are referencing pages on your site for your own site it is easier to use relative instead of absolute links. The remote client's browser will use the header called "hosts" which is a standard, required header. The idea is all of the links on your site are relative to document root.

For example, if one of the absolute links was to the main page "http://site_name.com" then the relative link name would be "/". Now when a browser goes to the page it knows the host name from the URL (host header), namely "site_name.com". When the browser sees the link "/" it knows this link as document root of site_name.com or "http://site_name.com/". When you connect using a ssh tunneling proxy or use a browser locally on the machine you initially tell the browser to connect to localhost. Now localhost is the "host" header and the link "/" means "http://localhost/".



Content over fluff

If your site contains more flash than content visitors will get tired of it and leave. If you have a story that takes up 2 written pages, but you place it on 15 HTML pages you will loose readers. Do not make your visitors click next, next, next as they will just close the browser tab.

Do not try to trick your readers by making ads look like links on your page. Readers will get wise to your tricks and avoid you pages completely.

Reduce the number of cute icons, animations and other distractive content. Let your visitors do what they came to do, read your content.

Make the site descriptive and full of content your target audience is looking for. The average reader is on the web looking for information. They do not have a lot of patience and they do not want to waste their time.



Check out the Yahoo Developer Network

Yahoo has published a page with many ideas they have found to speed up web site delivery times. The list was compiled by their "Exceptional Performance" team and has identified a number of practices for making your web pages load faster. The list includes 34 best practices divided into 7 categories. Best Practices for Speeding Up Your Web Site





Questions, comments, or suggestions? Contact Calomel.org