Scaling Web Apps with Apache 2.2 and mod_proxy_balancer on Debian

26 07 2007

Using Apache as a reverse proxy to front application servers (like mod_jk with Tomcat) is a common pattern in web application architecture, including applications based on Weblogic or WebSphere. Weblogic for instance, has a load balancer module for clustering of app servers.

In version 2.2 onward, the mod_proxy module of the Apache server has been extended to support load balancing with the mod_proxy_balancer module. For a lot of Rails installations, this seems to be a popular way of scaling. To implement this on Debian, here are the basic steps:

1. Install Apache2
To install Apache2 on Debian, use the apt-get command to install the apache2 package. This step is pretty much automatic:
apt-get apache2

2. Configure Required Modules
Apache2 on Debian has a slightly different layout. The files are in the /etc/apache2 directory and are divided into modules and sites:

domU-12-31-36-00-31-41:/etc/apache2# ls -alkn
total 60
drwxr-xr-x 7 0 0 4 2007-07-27 05:18 .
drwxr-xr-x 45 0 0 4 2007-07-26 22:41 ..
-rw-r--r-- 1 0 0 24 2007-03-27 12:53 apache2.conf
drwxr-xr-x 2 0 0 4 2007-07-17 07:57 conf.d
-rw-r--r-- 1 0 0 1 2007-03-27 12:58 envvars
-rw-r--r-- 1 0 0 0 2007-07-17 07:57 httpd.conf
drwxr-xr-x 2 0 0 4 2007-07-26 08:06 mods-available
drwxr-xr-x 2 0 0 4 2007-07-26 08:17 mods-enabled
-rw-r--r-- 1 0 0 1 2007-07-17 07:57 ports.conf
drwxr-xr-x 2 0 0 4 2007-07-26 08:08 sites-available
drwxr-xr-x 2 0 0 4 2007-07-26 08:08 sites-enabled

So instead of one monolithic httpd.conf file, things are broken down to small fragments of files for each moduel and the use of symbolic links allows quick changes of configuration. To faciliate configuration, several commands are available: a2enmod, a2ensite, etc. Here is a good reference.

At a minimum, several modules need to be configured and enabled. The module names are basically the file basenames in the mods-available directory.

In this example, the application server nodes are running locally at different ports and we want to reverse proxy requests to these nodes via the load balancer. The configuration files are to be stored in the mods-available directory. Here are the sample configurations.
The balancer:
domU-12-31-36-00-31-41:/etc/apache2# more mods-available/proxy_balancer.conf
<Proxy balancer://app>
# cluster member
BalancerMember http://127.0.0.1:8080 loadfactor=1
BalancerMember http://127.0.0.1:8081 loadfactor=1
</Proxy>

The main config file:
domU-12-31-36-00-31-41:/etc/apache2# more sites-enabled/000-default
#NameVirtualHost *
<virtualhost *:80>
ServerAdmin webmaster@localhost
ProxyPass / balancer://app/
ProxyPassReverse / balancer://app/
ErrorLog /var/log/apache2/error.log
LogLevel warn
CustomLog /var/log/apache2/access.log combined
ServerSignature On
</virtualhost>

It’s important to note the use of ‘/’ in the ProxyPassReverse directives and how they bind to the cluster name defined in the config file for the balancer (proxy_balance.conf). Once the config files are ready, the modules need to be enabled…3. Enable Modules
After the modules configuration files are complete, the modules are enabled via:

a2enmod proxy
a2enmod proxy_balancer
a2enmod proxy_http

For some reason, proxy_http is frequently overlooked. Without it, Apache will throw 403 errors when accessing resources at proxied URLs. In the logs, error messages like

[Thu Jul 26 07:06:13 2007] [warn] proxy: No protocol handler was valid for the URL /foo. If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule.

To show the loaded modules:

domU-12-31-36-00-31-41:/etc/apache2# apache2ctl -t -D DUMP_MODULES
Loaded Modules:
core_module (static)
log_config_module (static)
logio_module (static)
mpm_worker_module (static)
http_module (static)
so_module (static)
alias_module (shared)
auth_basic_module (shared)
authn_file_module (shared)
authz_default_module (shared)
authz_groupfile_module (shared)
authz_host_module (shared)
authz_user_module (shared)
autoindex_module (shared)
cache_module (shared)
cgid_module (shared)
dir_module (shared)
env_module (shared)
mime_module (shared)
negotiation_module (shared)
proxy_module (shared)
proxy_balancer_module (shared)
proxy_http_module (shared)
rewrite_module (shared)
setenvif_module (shared)
status_module (shared)
Syntax OK

Once the server is restarted (apache2ctl -k restart), the changes will take effect and requests should now be routed to the application server nodes running on ports 8080 and 8081 as configured in the example.

The load balancer module is quite powerful: it has options for setting load factors to better balance load across nodes of varying capacity.  For more information on configuration options, see the documentation.