Apache Web Cluster



(Reverse Proxy, ACL, Failover and High Availablity)



Home


If you have a single url as the presence for your company or organiztion you may have experienced limitations of serving all of your content from one physical server. No matter how powerful, the web server machine gets over loaded and your potential customers are put off by a slow, unresponsive website. The problem is you already have your main hostname, like /, out in the wild and in the search engines. You can not just buy another few hostnames for new machines and expect your customers to find you. What you need to do is split off the functions of your web server onto multiple machines, but have all of them accessed though your one publically established url. A web server proxy will do just this.

There are companies who sell proxy servers and you may have thought about buying an expensive network appliance like F5's Big-ip to allow you to have one front end proxy to many backend webservers. You like their ability to enforce access control lists and limit access from abusive clients in real time. The biggest problem is the price. Your organization might not be able to pay the tens of thousands for the equipment and the maintainance costs for big iron like this. Plus, you know what you want and you want to have fine grained control of the cluster without having to reley on any one else. You want to build your own cluster front end with the options above. Let's begin.

We are going to build is a web cluster front end using Apache v2.2 and back that up with seven(7) example servers behind it. You can have more or less and grow without your internet customers ever knowing. Each server behind the proxy can run what ever they want to as they will only talk to the proxy, not directly to the internet clients.

Using the proxy keeps clients behind the proxy safer as the proxy first filters all input to the web cluster. When a client requests data it will contact the prxy. The proxy will accept the request and then the proxy will make a requeest to the internal servers. Notice that the external untruxted clients never talk to the internal systems, but talk to the proxy middle man.



Getting Started

Lets take a look at the setup in our example web cluster. The "web-proxy" is the machine remote clients will connect to. It is the machine that responds to client requests, filters them and decides which internal machine will fulfill the request. To give you a good idea of what a web proxy can do we will setup seven(7) web cluster machines.

NOTICE: The binary download servers have a primary machine and a "hot spare" backup. We are going to setup a cluster that will enable a backup machine if the primary ever goes down. If you need to save more money, the primary machine can be the high speed machine you spend your money on. The backup can be a moderatly powered machine just to get the job done until the primary comes back up. The machines "webserver_one" and "webserver_two" are going to be load balanced machines. They will get a 50/50 split of traffic. If one goes down, all traffic will be sent to the machine that is still up.

              /--- news_server.domain.lan [/news_rss and /news_blog]
              |
              |--- paid_one.domain.lan [/paid_user] (primary)
              |--- paid_two.domain.lan [/paid_user] (secondary "hot spare")
web-proxy ----|
              |--- free_one.domain.lan [/free_user] (primary)
              |--- free_two.domain.lan [/free_user] (secondary "hot spare")
              | 
              |--- webserver_one.domain.lan (primary, 50% traffic)
              \--- webserver_two.domain.lan (secondary, 50% traffic)



Looking at the httpd.conf

Below you will find the link to the Apache proxy example file and below that is the same httpd.conf file in a text box. Both formats are available to make it easier for you to review the code. This example is a fully working config file with the exception of setting up a few variables for your enviorment. The config file is fully commented.

You can download the http.conf here by doing a "save as" or just clicking on the link and choosing download. Before using the config file take a look it below or download it and look at the options.

apache proxy httpd config file

#######################################################
###  Calomel.org  proxy httpd.conf   BEGIN
#######################################################
#
# ServerRoot: The top of the directory tree under which the server's
# configuration, error, and log files are kept.
#
# Do not add a slash at the end of the directory path.  If you point
# ServerRoot at a non-local disk, be sure to point the LockFile directive
# at a local disk.  If you wish to share the same ServerRoot for multiple
# httpd daemons, you will need to change at least LockFile and PidFile.
#
ServerRoot "/usr/local/apache2"

#
# Listen: Allows you to bind Apache to specific IP addresses and/or
# ports, instead of the default. See also the 
# directive.
#
# Change this to Listen on specific IP addresses as shown below to 
# prevent Apache from glomming onto all bound IP addresses.
#
Listen 80

#
# Dynamic Shared Object (DSO) Support
#
# To be able to use the functionality of a module which was built as a DSO you
# have to place corresponding `LoadModule' lines at this location so the
# directives contained in it are actually available _before_ they are used.
# Statically compiled modules (those listed by `httpd -l') do not need
# to be loaded here.
#
LoadFile /usr/lib/libxml2.so
LoadModule evasive20_module modules/mod_evasive20.so
LoadModule security2_module modules/mod_security2.so
#


#
# If you wish httpd to run as a different user or group, you must run
# httpd as root initially and it will switch.  
#
# User/Group: The name (or #number) of the user/group to run httpd as.
# It is usually good practice to create a dedicated user and group for
# running httpd, as with most system services.
#
User web_daemon
Group web_daemon


# 'Main' server configuration
#
# The directives in this section set up the values used by the 'main'
# server, which responds to any requests that aren't handled by a
#  definition.  These values also provide defaults for
# any  containers you may define later in the file.
#
# All of these directives may appear inside  containers,
# in which case these default settings will be overridden for the
# virtual host being defined.
#
# ServerAdmin: Your address, where problems with the server should be
# e-mailed.  This address appears on some server-generated pages, such
# as error documents.  e.g. admin@your-domain.com
#
ServerAdmin webmaster@your_hostname.com

#
# ServerName gives the name and port that the server uses to identify itself.
# This can often be determined automatically, but we recommend you specify
# it explicitly to prevent problems during startup.
#
# If your host doesn't have a registered DNS name, enter its IP address here.
#
ServerName your_hostname.com:80

# Timeout: The number of seconds before the daemon receives a request, does not
# get an answer and sends time out.
Timeout 180

##################################################################
#
# Client to server request limitations
LimitRequestBody 102400
LimitRequestFields 40
LimitRequestFieldsize 1000
LimitRequestLine 1000

#
# Mod_Rewrite limits on acceptable characters
RewriteEngine on
RewriteLog /usr/local/apache2/logs/mod_rewrite.log
RewriteLogLevel 0
RewriteRule [^a-zA-Z0-9|\.|/|_|-]  -  [F]
#
##################################################################

# Mod_Proxy Settings
  
    ProxyRequests Off
    ProxyPreserveHost On
   
    Order deny,allow
    Allow from all
   

   ## blogging and rss to the news_server
   ## (one server handles both the rss and blog trees)
     ProxyPass /news_rss/ http://news_server.domain.lan/news_rss/
     ProxyPassReverse /news_rss/ http://news_server.domain.lan/news_rss/

     ProxyPass /news_blog/ http://news_server.domain.lan/news_blog/
     ProxyPassReverse /news_blog/ http://news_server.domain.lan/news_blog/

   ## binary download cluster (perhaps a paid HIGH speed cluster)
   ## (fail over cluster - if "one" goes down "two" takes over)
     ProxyPass /paid_user balancer://paid_servers/
      
        BalancerMember http://paid_one.domain.lan/paid_user route=a redirect=b
        BalancerMember http://paid_two.domain.lan/paid_user route=b status=+H
      
     ProxyPassReverse /paid_user balancer://paid_servers/

   ## binary download cluster (perhaps a public free LOW speed cluster)
   ## (fail over cluster - if "one" goes down "two" takes over)
     ProxyPass /free_user balancer://free_servers/
      
        BalancerMember http://free_one.domain.lan/free_user route=a redirect=b
        BalancerMember http://free_two.domain.lan/free_user route=b status=+H
      
     ProxyPassReverse /free_user balancer://free_servers/

   ## The rest of the traffic goes to the default webservers
   ## (load balanced cluster - equal traffic goes to "one" and "two")
     ProxyPass / balancer://web_servers/
      
        BalancerMember http://webserver_one.domain.lan 
        BalancerMember http://webserver_two.domain.lan 
      
     ProxyPassReverse / balancer://web_servers/
  

#
# Mod_evasive to avoid DDOS
DOSWhitelist 10.10.10.2
DOSLogDir "/usr/local/apache2/logs/mod_evasive.log"

    DOSHashTableSize    314739
    DOSPageCount        2
    DOSPageInterval     1
    DOSSiteCount        30
    DOSSiteInterval     1
    DOSBlockingPeriod   30


#
# Mod_Security settings
Include conf/modsecurity/*.conf
SecAuditLog /usr/local/apache2/logs/modsec_security.log
SecServerSignature your_hostname.com

#
# Mod_Expires to tell clients to cache files 
ExpiresActive On
ExpiresDefault "access plus 2 hours"

##################################################################

#
# EnableMMAP and EnableSendfile: On systems that support it, 
# memory-mapping or the sendfile syscall is used to deliver
# files.  This usually improves server performance, but must
# be turned off when serving from networked-mounted 
# filesystems or if support for these functions is otherwise
# broken on your system.
#
EnableMMAP on
EnableSendfile on

#
# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a 
# container, error messages relating to that virtual host will be
# logged here.  If you *do* define an error logfile for a 
# container, that host's errors will be logged there and not here.
#
ErrorLog logs/error_log

#
# LogLevel: Control the number of messages logged to the error_log.
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
#
#LogLevel debug
LogLevel notice


    #
    # The following directives define some format nicknames for use with
    # a CustomLog directive (see below).
    #
    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
    LogFormat "%h %l %u %t \"%r\" %>s %b" common

    
      # You need to enable mod_logio.c to use %I and %O
      LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio
    

    #
    # The location and format of the access logfile (Common Logfile Format).
    # If you do not define any access logfiles within a 
    # container, they will be logged here.  Contrariwise, if you *do*
    # define per- access logfiles, transactions will be
    # logged therein and *not* in this file.
    #
    CustomLog logs/access_log combined

    #
    # If you prefer a logfile with access, agent, and referer information
    # (Combined Logfile Format) you can use the following directive.
    #
    #CustomLog logs/access_log combined


#
# Customizable error responses come in three flavors:
# 1) plain text 2) local redirects 3) external redirects
# NOTE: using streight text is small and efficient.
#
ErrorDocument 505 "your_custom_error_message_here"
ErrorDocument 504 "your_custom_error_message_here"
ErrorDocument 503 "your_custom_error_message_here"
ErrorDocument 502 "your_custom_error_message_here"
ErrorDocument 501 "your_custom_error_message_here"
ErrorDocument 500 "your_custom_error_message_here"
ErrorDocument 417 "your_custom_error_message_here"
ErrorDocument 416 "your_custom_error_message_here"
ErrorDocument 415 "your_custom_error_message_here"
ErrorDocument 414 "your_custom_error_message_here"
ErrorDocument 413 "your_custom_error_message_here"
ErrorDocument 412 "your_custom_error_message_here"
ErrorDocument 411 "your_custom_error_message_here"
ErrorDocument 410 "your_custom_error_message_here"
ErrorDocument 409 "your_custom_error_message_here"
ErrorDocument 408 "your_custom_error_message_here"
ErrorDocument 407 "your_custom_error_message_here"
ErrorDocument 406 "your_custom_error_message_here"
ErrorDocument 405 "your_custom_error_message_here"
ErrorDocument 404 "your_custom_error_message_here"
ErrorDocument 403 "your_custom_error_message_here"
ErrorDocument 402 "your_custom_error_message_here"
ErrorDocument 401 "your_custom_error_message_here"
ErrorDocument 400 "your_custom_error_message_here"
#
## Other example error pages
#ErrorDocument 500 "The server has made a mistake."
#ErrorDocument 404 /missing.html
#ErrorDocument 404 "/cgi-bin/missing_handler.pl"
#ErrorDocument 402 http://www.example.com/subscription_info.html
#
#######################################################
###  Calomel.org proxy  httpd.conf   END
#######################################################


Building the binaries

First, we need to install the following packages from source: Apache 2.2 or higher, mod_security, and mod_evasive. We will be building from source because we can setup exactly what we need and no more. There is no need for most of the packages or modules Apache comes with to be installed on our proxy server. Let's keep it to a minimum.

The packages will be installed into the default directory /usr/local/apache2/ . The config files are in /usr/local/apache2/conf and the logs are in /usr/local/apache2/logs .

To build Apache 2.2 from source make a working directory any where you want and download the source to that directory. Untar the package and change to the untared directory. Execute the following line to make Apache with the modules we need for the proxy. The install will be put into /usr/local/apache2/ .

This line will:

killall httpd;rm -rf /usr/local/apache2/bin; make clean; ./configure --disable-cgi --disable-negotiation --disable-autoindex --disable-status --disable-userdir --enable-proxy --enable-proxy-http --enable-rewrite --enable-unique-id --enable-expires && make && make install

For security and to reduce the change that a DDOs will affect our new proxy we are going to also install a the modules mod_evasive and mod_security.

To build mod_evasive download the latest source code here and put it into the working directory. Untar the package and execute the following line to build it. Copy the mod_evasive20.so file to the apache2 modules directory /usr/local/apache2/modules/ .

/usr/local/apache2/bin/apxs -i -a -c mod_evasive20.c

To build mod_security download the latest source code here and put it into the working directory. Untar the package and execute the following line to build it. Copy the mod_security2.so file to the apache2 modules directory /usr/local/apache2/modules/ .

make && make install



Need help setting up Apache for speed and security? Make sure to check out our Apache Web Server "how to". We provide expinations and fully working examples.



Editing the httpd.conf

Now that the install is finished and the security modules are built and copied to the corect directory, it is time to download the config file httpd.conf from above. Save the calomel.org httpd.conf file to /usr/local/apache2/conf/httpd.conf . It is time to take a look at some of the directives that need your attention.

The "user" and "group" need to be a valid non-priviged user. You may want to avoid using generic system users like "nobody" "or daemon" and make your own user. For the example we used "web_daemon".

Check the ServerName and ServerAdmin directives to make sure they reflect the name of your server and the contact information in case someone has any problems.

Mod_Proxy Settings are the heart of this exercise. We have setup a reverse proxy server that will front end seven(7) clustered machines. A reverse proxy is simply a server that accepts connections from a lot of clients on the internet and redirects those requests to a few clients on the cluster. A regular proxy allows a small number of clients to access the internet as a whole.

The reverse proxy works on by looking at the url the client requested and seeing where that request should be routed to on the cluster. Lets say our full url for our server is "http://your_hostname.com". Lets take a look at what a would happent to a client request as it asks for data from our cluster.

The first proxy setting is to rediect all requests for the url http://your_hostname.com/news_rss/ to http://news_server.domain.lan/news_rss/ and http://your_hostname.com/news_blog/ to http://news_server.domain.lan/news_blog/.

The set of high speed servers we want to use for paying customers will redirect 100% of the traffic from http://your_hostname.com/paid_user/ to http://paid_one.domain.lan/paid_user/ . If paid_one.domain.lan is down then all traffic will goto paid_two.domain.lan. Once paid_one.domain.lan comes back up all traffic will revert back to paid_one.domain.lan from paid_two.domain.lan. The idea is paid_one.domain.lan is the fastest primary server and paid_two.domain.lan should only be used as a backup.

The servers free_one.domain.lan and free_two.domain.lan are the exact same setup as the paid_user machines. They are in the example to show that you can have may sets of clustered machines for different purposes.

All traffic not filtered out by the privious rules will then be redirected to webserver_one.domain.lan and webserver_two.domain.lan in round robin load balancing mode. If either server goes down then 100% of the traffic will be directed to the webserver still up. Once the downed server comes back up the proxy will loadbalance the traffic automatically between the servers. The cluster is flexable enough that you could add a webserver_three machine, add an addition line in the httpd.conf and the proxy would loadbalcne 33% of the traffic to each machine.

The settings for Mod_evasive are to avoid DDOS attacks. The directive DOSWhitelist can be a single ip or network you wish to exclude like a local lan from the evasive limitations.

The Mod_Security settings set the directory where the default security rules can be found and you need to make sure the directive SecServerSignature is set to your hostname.

Mod_Expires to tell clients to cache files so you can save bandwidth. If a client is going from one page to another on your site. You can ask them to keep a cached copy of objects like pictures and buttons so they do not download the same data over and over again.

Customizable error responses allow you to customize how a remote client is treated when they encounter an error. Here you can specify a 1) plain text, 2) local redirects or 3) external redirects to help errored clients find your content. In the example we set all errors to the simple text string "your_custom_error_message_here".



Want more speed? Make sure to also check out the Network Speed and Performance Guide. With a little time and understanding you could easily double your firewall's throughput.



Running the server

The last step is to make sure the server will start without error. The following will start, stop or do a graceful restart of the apache server:

/usr/local/apache2/bin/apachectl -k start
/usr/local/apache2/bin/apachectl -k stop
/usr/local/apache2/bin/apachectl -k graceful

If you get an error make sure to check the access_log and error_log in the /usr/local/apache2/log/ directory. All of the modules that we built will alos log to the same directory.



Questions?

How can I test if the server is returning the correct headers?

You can use telnet to connect to port 80 of the web server. Then you can send a request for a GET, HEAD, or POST and see what the server returns. Here we sent a "HEAD / HTTP/1.1" request and we were denied like we were supposed to be.
## Test using telnet
username@machine:~$ telnet your_hostname.com 80 
Trying 10.20.30.40...
Connected to your_hostname.com (10.20.30.40).
Escape character is '^]'.
HEAD / HTTP/1.1

HTTP/1.1 400 Bad Request
Date: Mon, 10 Jan 2010 10:20:30 GMT
Server: your_hostname.com 
Connection: close
Content-Type: text/html; charset=iso-8859-1

Connection closed by foreign host.

How can I see what modules are loaded on my custom built Apache binary?

Using the "-M" argument will show all of the madules statically built into the binary. Your modules may look something like this.
[user@webserver ~]# /usr/local/apache2/bin/apachectl -M
Loaded Modules:
 core_module (static)
 authn_file_module (static)
 authn_default_module (static)
 authz_host_module (static)
 authz_groupfile_module (static)
 authz_user_module (static)
 authz_default_module (static)
 auth_basic_module (static)
 include_module (static)
 filter_module (static)
 log_config_module (static)
 env_module (static)
 expires_module (static)
 unique_id_module (static)
 setenvif_module (static)
 mpm_prefork_module (static)
 http_module (static)
 mime_module (static)
 asis_module (static)
 cgi_module (static)
 dir_module (static)
 actions_module (static)
 alias_module (static)
 rewrite_module (static)
 so_module (static)
 perl_module (shared)
 evasive20_module (shared)
 security2_module (shared)
Syntax OK





Questions, comments, or suggestions? Contact Calomel.org