29 May 2008

Apache + SSL + Name-Based-Virtual-Hosts

You might have stumbled on this page looking for a success story solving the riddle of the post's subject. Sorry. That's not the case.

Rather, I'm going to write about the problem, my efforts at workarounds, and finally, the recommended course(s) of action.

The Problem

My Apache web server fronts several applications. The server has several aliased IP names to the IP address. I would like to keep each application in its own virtual host, use name-based virtual hosting, and run some or all of them over SSL.

The Findings

Without having done any background reading, solving that problem ought to be trivial for Apache. It is not and here's why.

When a request is made for an https URL or tcp port 443, there is actually an SSL layer on top of HTTP. The SSL tunnel must be established first and then the http can flow through the tunnel. In establishing the tunnel, SSL does not have access to the HTTP "host" header and so cannot know which name based virtual host to use.

Here's a much better explanation from the folks at Apache:
http://httpd.apache.org/docs/2.2/ssl/ssl_faq.html#vhosts

The bottom line is that they claim that it is impossible to solve the SSL + Name-Based-Virtual-Hosts.

Natually, I didn't believe it (and neither did my manager). So I set out trying all sorts of hacks. All failed...some miserably so.

Ok, so now what?

The last line of the SSL FAQ link above provides a direction: "Using separate IP addresses for different SSL hosts. Using different port numbers for different SSL hosts."


The Conclusion(s)

Option A - Use separate IP addresses for different SSL hosts.

The idea here is that you can continue to use Virtual Hosts but they will not be name-based. Instead you must have multiple NICs in your server. Adding NICs won't scale well but there is a better alternative, espcially if you use Linux. You can simply use one NIC and then use IP aliasing to create clones of the NIC. Each clone gets its own unique IP address. This works great and is a breeze to implement.

The downside here is that externally available IP addresses are scarce and, therefore, costly for most companies. I don't have a good solution for that other than to perhaps look into IPv6 but that is the topic in itself.

Option B - Using different port numbers for different SSL hosts.

This solution is exactly what it seems but worse.
You would run your SSL over arbitrary, unused port numbers such as 444, 445, 446, etc. (443 is the standard HTTP SSL port).

This option disallows using aliased IP names. Instead you have a single ssl certificate and ssl key for the server. All https requests go the only IP name but then you use virtual hosts identified by the requested port number.

For example,
http://appserver.example.org:444/
http://appserver.example.org:445/
http://appserver.example.org:446/

This just loooks clumsy to me. Customers would balk.

Option C - Use a single IP address but unique Locations.

This is a variant of Option B above but maybe more sane.

One would have a single SSL enabled address such as, https://appserver.example.org/. You would then put each of your applications at different "locations."

For example,
https://appserver.example.org/feedback (the location is /feedback)
https://appserver.example.org/outfitter (the location is /outfitter)
https://appserver.example.org/delta (the location is /delta)

Then, in Apache you would define your Location blocks. So, these aren't Virtual Hosts but Location blocks.

The drawback here is that this solution isn't optimal for hosting a customer's application when the customer wants to use a different IP name.

In my opinion, this is clean for a company's internal applications but that's about it.

Appendix

Other directions that I investigated included packet mangling and/or and NAT using Linux's IP tables. IP Tables and Netfilter are awesome and are useful for more than stateful firewalling. Alas, it just didn't seem to solve the problem or else was just tending to be way too complicated when Option A above worked cleanly.

I also briefly implemented nginx and glanced at Squid for their ability to act as proxying servers and to front-end the Apache servers. Nginx, expecially, looked fantastic but didn't seem to have the ability to overcome the SSL + Name-Based-Virtual-Host issue. That issue seems to be a technical one and not an application-specific issue.

I looked into a lot of dark corners and tried out a lot of things in researching this topic, most of which I didn't even mention. Feel free to comment with follow-up or questions.

21 May 2008

MySQL Installer

I spent much of the last week fiddling with Inno Setup to create a MySQL installer for win32. The installer asks for the program install directory (and suggests a default) and the location of the data directory (and suggests a default). It independently installs MySQL server, ODBC, and GUI tools based on check boxes. That's it. It's pretty slick.

Why not just use the .msi installers that MySQL provides, you ask? Well, my installer is going out to dozens of sites and the configuration is set beforehand and is bound to our commercial application. The MySQL .msi installers ask dozens of configuration questions and we're looking to reduce that and to prevent the sysadmins who are installing our application from making wrong choices. For example, we must run the InnoDB engine only and never the non-transactional MyISAM engine. That's a choice the the MySQL .msi installer gives but we do not. My installer also automatically configures users, passwords, databases, tables, and access.

Sadly, this work invalidates my company's decision to standardize on PostgreSQL. Management decided to opt for short-term expediency instead of long-term robustness and scalability. Brilliant. (sarcasm) That won't come back to haunt them. (dripping sarcasm)

13 May 2008

Inno Setup

I have been tasked at work with writing a one-click MySQL installer for win32. Luckily, I have the awesome and amazing Inno Setup packaging tool at my disposal. Inno Setup is full-featured and capable, however, it isn't easy to use for complicated packaging scenarios. One has to discover and use helper scripts to simplify tasks such as decompression and manipulating environment variables. My secret weapon is that I have access to a great set of example scripts that a previous colleague wrote a few years ago. He was a super bright guy but it still took him months to produce our main Inno Setup script. That just shows the power and complexity of the tool. I expect that it'll take me 3-5 more days to get the MySQL installer just right with the proper configuration and addition of odbc and the gui tools.

08 May 2008

Ssh port forwarding, gateway ports, and timeouts

We have a client that firewalls its network to us except for tcp 22 (ssh) from one particular address -- our development server. The goal for us is to be able to remotely use the deployed web application that we developed for them so that we can check out the effect of bug fixes on the live site.

Enter ssh...

I'm not going to talk much about ssh and its better-known capabilities because that's going to be all over the web anyway. Really, go see for yourself.

Instead, we log into our local development server and issue this command:

ssh -g -f -L 8082:localhost:80 user@w.x.y.z sleep 60

where w.x.y.z represents an IP address of the client's server.

What that command does is to forward traffic from localhost:8082 to w.x.y.z:80. It also says to go into the background and to bind port 8082 to all interfaces and not just loopback. That means that anyone who can get to our server's port 8082 can use the tunnel but our development server is behind two firewalls so I'm not worried.

You can then log out of the development server since the ssh tunnel is backgrounded. Open a web browser on your local box, point it to the development server's port 8082, and you'll be seeing the client's web application. Nifty!

Finally, there's a "sleep" command at the end. That's just an example of ssh's ability to perform remote commands. The intent here is to keep the tunnel up for at least x number of sleep seconds but if another application is actively using that tunnel at the same time the tunnel will not close even though the sleep may have already expired. Once the sleep duration has passed and nothing is using the tunnel, the tunnel automatically closes.

That almost works in our case with testing the web application except that http is stateless. That is, it establishes a socket connection every time a request is made and then the socket is closed after the requester receives its response. (Actually, it takes a few seconds for the opened socket to disappear.) For all practical purposes, you probably have a 5-10 second inactivity timeout no matter what you set "sleep" to be as long as you're only using a stateless connection. Hopefully, that'll be enough time for you to test the bug fix on the client's box! One could always just set the sleep duration much higher. Remember, the auto-logout only takes effect once the sleep duration interval has passed.

I got the basics for this auto-timeout solution from http://www.g-loaded.eu/2006/11/24/auto-closing-ssh-tunnels/.

07 May 2008

Better XML needed in Python

I must say that I have just about lost patience with the heroic maintainers and developers of python xml libraries. I'm grateful to have those tools at my disposal but am frustrated that so many key ones seem to languish or are somewhat incomplete. For example, I love elementtree but there still is incomplete XPath support. I really need XPath and have to use 4Suite to get it. I also use 4Suite for applying style sheets. 4Suite is good but is just awkward enough to be not joyful to use. It went for a while without any updates and sat at beta status seemingly forever. Then there's ZSI for SOAP. Well, SOAP is pretty unhappy stuff anyway so you just kind of have to make the best of a bad situation.

Believe it or not, I have to use elementtree, 4Suite, and ZSI on a single project. Oh well, at least I was able to ditch PyXML on that project.

Thanks for the rant space.