Scaling Web Apps with Apache 2.2 and mod_proxy_balancer on Debian

26 07 2007

Using Apache as a reverse proxy to front application servers (like mod_jk with Tomcat) is a common pattern in web application architecture, including applications based on Weblogic or WebSphere. Weblogic for instance, has a load balancer module for clustering of app servers.

In version 2.2 onward, the mod_proxy module of the Apache server has been extended to support load balancing with the mod_proxy_balancer module. For a lot of Rails installations, this seems to be a popular way of scaling. To implement this on Debian, here are the basic steps:

1. Install Apache2
To install Apache2 on Debian, use the apt-get command to install the apache2 package. This step is pretty much automatic:
apt-get apache2

2. Configure Required Modules
Apache2 on Debian has a slightly different layout. The files are in the /etc/apache2 directory and are divided into modules and sites:

domU-12-31-36-00-31-41:/etc/apache2# ls -alkn
total 60
drwxr-xr-x 7 0 0 4 2007-07-27 05:18 .
drwxr-xr-x 45 0 0 4 2007-07-26 22:41 ..
-rw-r--r-- 1 0 0 24 2007-03-27 12:53 apache2.conf
drwxr-xr-x 2 0 0 4 2007-07-17 07:57 conf.d
-rw-r--r-- 1 0 0 1 2007-03-27 12:58 envvars
-rw-r--r-- 1 0 0 0 2007-07-17 07:57 httpd.conf
drwxr-xr-x 2 0 0 4 2007-07-26 08:06 mods-available
drwxr-xr-x 2 0 0 4 2007-07-26 08:17 mods-enabled
-rw-r--r-- 1 0 0 1 2007-07-17 07:57 ports.conf
drwxr-xr-x 2 0 0 4 2007-07-26 08:08 sites-available
drwxr-xr-x 2 0 0 4 2007-07-26 08:08 sites-enabled

So instead of one monolithic httpd.conf file, things are broken down to small fragments of files for each moduel and the use of symbolic links allows quick changes of configuration. To faciliate configuration, several commands are available: a2enmod, a2ensite, etc. Here is a good reference.

At a minimum, several modules need to be configured and enabled. The module names are basically the file basenames in the mods-available directory.

In this example, the application server nodes are running locally at different ports and we want to reverse proxy requests to these nodes via the load balancer. The configuration files are to be stored in the mods-available directory. Here are the sample configurations.
The balancer:
domU-12-31-36-00-31-41:/etc/apache2# more mods-available/proxy_balancer.conf
<Proxy balancer://app>
# cluster member
BalancerMember http://127.0.0.1:8080 loadfactor=1
BalancerMember http://127.0.0.1:8081 loadfactor=1
</Proxy>

The main config file:
domU-12-31-36-00-31-41:/etc/apache2# more sites-enabled/000-default
#NameVirtualHost *
<virtualhost *:80>
ServerAdmin webmaster@localhost
ProxyPass / balancer://app/
ProxyPassReverse / balancer://app/
ErrorLog /var/log/apache2/error.log
LogLevel warn
CustomLog /var/log/apache2/access.log combined
ServerSignature On
</virtualhost>

It’s important to note the use of ‘/’ in the ProxyPassReverse directives and how they bind to the cluster name defined in the config file for the balancer (proxy_balance.conf). Once the config files are ready, the modules need to be enabled…3. Enable Modules
After the modules configuration files are complete, the modules are enabled via:

a2enmod proxy
a2enmod proxy_balancer
a2enmod proxy_http

For some reason, proxy_http is frequently overlooked. Without it, Apache will throw 403 errors when accessing resources at proxied URLs. In the logs, error messages like

[Thu Jul 26 07:06:13 2007] [warn] proxy: No protocol handler was valid for the URL /foo. If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule.

To show the loaded modules:

domU-12-31-36-00-31-41:/etc/apache2# apache2ctl -t -D DUMP_MODULES
Loaded Modules:
core_module (static)
log_config_module (static)
logio_module (static)
mpm_worker_module (static)
http_module (static)
so_module (static)
alias_module (shared)
auth_basic_module (shared)
authn_file_module (shared)
authz_default_module (shared)
authz_groupfile_module (shared)
authz_host_module (shared)
authz_user_module (shared)
autoindex_module (shared)
cache_module (shared)
cgid_module (shared)
dir_module (shared)
env_module (shared)
mime_module (shared)
negotiation_module (shared)
proxy_module (shared)
proxy_balancer_module (shared)
proxy_http_module (shared)
rewrite_module (shared)
setenvif_module (shared)
status_module (shared)
Syntax OK

Once the server is restarted (apache2ctl -k restart), the changes will take effect and requests should now be routed to the application server nodes running on ports 8080 and 8081 as configured in the example.

The load balancer module is quite powerful: it has options for setting load factors to better balance load across nodes of varying capacity.  For more information on configuration options, see the documentation.



Running Amazon EC2

14 07 2007

So I was told the other day that I had 48 hours to migrate one of my previous AMI (Amazon Machine Image) instances, because the instance was running on degraded hardware (which apparently had some hardware failure). Amazon was pro-active about it and had stopped billing for this particular instance. In any case, I was able to login to this instance and retrieve files that are specific to this instance.

One great thing about EC2 is that there is a wide selection of public AMIs available. This means one can experiment with different system configurations, from the different Linux distributions (e.g. Debian, Fedora, Gentoo) to the software installed (e.g. a full LAMP stack). Since my old instance has been running the Amazon-provided Fedora4, I figure it’s time to try something else and also document the steps.

1. Getting the certificate and private key ready
Amazon provides a set of command-line tools (written in Java) for working with EC2. These tools use HTTPS to communicate with the web service and therefore requires X.509 certificates and private keys to be around. Once you are in the EC2 program (currently Beta at aws.amazon.com), you can generate the required certificate and keys. Since I already have these generated before, there’s little to do here. Amazon recommends putting these credential files in, say, ~/.ec2, like so:

$ ls -alkn ~/.ec2
total 12
drwx------ 5 501 501 170 Jul 14 15:05 .
drwxr-xr-x 53 501 501 1802 Jul 14 14:11 ..
-rw-r--r-- 1 501 501 689 Jul 14 14:11 cert-A6O5VGEIFPYKTCNTXVK4D2XE5ESNCB7U.pem
-rw-r--r-- 1 501 501 721 Jul 14 14:11 pk-A6O5VGEIFPYKTCNTXVK4D2XE5ESNCB7U.pem

2. Install EC2 Tools
The command line tools are available from the Developer Connection site (here). These tools also expect Java to be installed on your machine. Once installed, the tools are ready for use after setting a few environment variables. For example, the downloaded zip file unzips to directory foo, and the environment variables are then set as follows:

export EC2_HOME=~/projects/ec2/ec2-api-2007-03-01/ec2-api-tools
export PATH=$EC2_HOME/bin:$PATH
export EC2_PRIVATE_KEY=~/.ec2/pk-A6O5VGEIFPYKTCNTXVK4D2XE5ESNCB7U.pem
export EC2_CERT=~/.ec2/cert-A6O5VGEIFPYKTCNTXVK4D2XE5ESNCB7U.pem

That’s it!

3. Finding public AMIs, checking instances
Several commands are useful:

ec2-describe-images -a This shows a list of available AMIs. This has grown to be a pretty long list from the early days of EC2. This information is also available at http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=101

Since this new instance will be serving web pages with a Java and S3 backend, an AMI that is small, optimized for Xen, would be ideal. For this, I decided to try out the Gentoo distribution (ami-8b8a6fe2). This particular AMI has the basics like openssh and Apache2 and that’s about it.

ec2-describe-instances basically shows any currently running instances. There is also a Firefox plugin that simplifies much of these tasks by providing a nice UI. Still, command line tools are scriptable…

4. Running a new AMI instance
With the tools in place and the cert/keys ready, starting up a new instance is pretty trivial. First, we need to generate a keypair for the new instance. An instance of a public AMI has no password and logging in via ssh requires public/private keypair. Since this is specific to my instantiations of the AMI, a keypair need to be generated so that one half of the keypair is embedded in this AMI allowing me to login later on with the other half of the keypair. To do this, simply use the command:

ec2-add-keypair ami-8b8a6fe2-gentoo-base-eminent

The key name ami-8b8a6fe2-gentoo-base-eminent is created following a simple convention that denotes which AMI (ami-8b8a6fe2) and the configuration (gentoo-base-eminent). This key is important in later starting up new instances of this AMI. This command prints out the private RSA key necessary for ssh login (via the -i option) later:

-----BEGIN RSA PRIVATE KEY-----
MIIEpgIBAAKCAQEAmmVOcPrBRXgGbo3XtvKxld/Glmuqi9gGKLNzyfUspKCuSjwmgHB91y7e8aH+
tGyHdbYnHPC/nNbh15F3jjdneM5W1GphcUJu4m2HylAklgTOC8pYVdS8XacKiGSBaUXvZimXCsH/
Uzcm3rxfxwNESwWpsg9aPXYi//T0quqM1xvZNFXO1s1s5ZJfKugCUUJrq365afaOR1hiipx+02U5
zKSTYZc9XWKbbaNSSeIDCPh8CZTxEH/FEuutaMxisMJ26uAqD0plnc1sj+mv8NNCl+/XgTlPLzVg
...
-----END RSA PRIVATE KEY-----

Since this output needs to be captured in a file, we can just do this:

ec2-add-keypair ami-8b8a6fe2-gentoo-base-eminent > ~/.ec2/ami-8b8a6fe2-gentoo-base-eminent.id

Change the permission of the id file (chmod 600 ~/.ec2/ami-8b8a6fe2-gentoo-base-eminent.id) or ssh won’t like it! Now that we have the keypair, start up a new instance:

ec2-run-instances ami-8b8a6fe2 -k ami-8b8a6fe2-gentoo-base-eminent

This here starts up one instance of the Gentoo AMI (ami-8b8a6fe2) identified by the keypair (ami-8b8a6fe2-gentoo-base-eminent) . This instance is started with the default group. In order to gain ssh access, we need to authorize port access (for the default group):

ec2-authorize default -p 22 for ssh, and ec2-authorize default -p 80 for HTTP.

5. Connecting to the new AMI instance
Now to connect to the new instance, we first find out how to get to it:

$ ec2-describe-instances
...
INSTANCE i-8d688be4 ami-8b8a6fe2 ec2-72-44-51-245.z-1.compute-1.amazonaws.com ...

To connect to it, simply ssh to the hostname listed above:

ssh -i ~/.ec2/ami-8b8a6fe2-gentoo-base-eminent.id root@ec2-72-44-51-245.z-1.compute-1.amazonaws.com

That’s it!