Backyard Bamboo: 2008

19 December 2008

Native Postgresql Data Types for ActiveRecord

In researching ActiveRecord migrations, I came across this bit of new information in their documentation:

Instantiates a new column for the table. The type parameter is normally one of the migrations native types, which is one of the following: :primary_key, :string, :text, :integer, :float, :decimal, :datetime, :timestamp, :time, :date, :binary, :boolean.

You may use a type not in this list as long as it is supported by your database (for example, "polygon" in MySQL), but this will not be database agnostic and should usually be avoided.

So, what? Well, this came in handy immediately because I was creating a table that used Postgresql's native inet data type. I plan to use some of the related inet functionality at a later date so this is a big win.

Here's an abbreviated version of my migration that uses the inet datatype. I am running Ruby on Rails 2.2.


class CreateDevices < ActiveRecord::Migration
  def self.up
    create_table :devices do |t|
      t.column("ip_address", :inet) # this works beautifully
      # t.inet :ip_address # this won't work

      t.timestamps
    end
  end

  def self.down
    drop_table :devices
  end
end

Notice that one has to be explicit and use the column method and not the shortcut.

18 December 2008

Ruby's native postgres gem on Windows

The ruby-postgres gem is a compiled and fast adapter. It has some dependencies for Windows, though, so follow these steps to get it running properly:

go to the http://www.postgresql.org/ftp/binary/ and navigate to the release that matches the Postgres server that you installed.
look for and download a file that follows the pattern postgresql-8.X-binaries-no-installer.zip where the X corresponds to the database release
unzip the file and navigate to its bin subdirectory
copy the following files into your ruby installation's bin directory
- comerr32.dll
- krb5_32.dll
- k5sprt32.dll
- libeay32.dll
- libpq.dll
- libiconv2.dll
- libintl3.dll
- ssleay32.dll
- gssapi32.dll
after you have done all that, you should be able to open an interactive ruby session (irb) and type require 'postgres' and get 'true' as a result, i.e., a non-error

17 December 2008

Pylons and SQLAlchemy

I recently gave Pylons and SQLAlchemy a spin. I mainly wanted to check out SQLAlchemy so that I could evaluate it for upcoming projects. Well, my opinion is that it is wildly flexible and, therefore, came with a steep learning curve. Pylons wasn't too hard to grasp although it seemed sort of arcane at times. I genuinely liked Pylons though.

This has been a disappointing experience for me. I wanted Pylons and SQLAlchemy to be fantastic so that I'd have a compelling reason to come back to Python full-time from Ruby. Not the case, I'm afraid. Looks like I'm going to be spending more time in the Ruby and Ruby on Rails camps in the future.

Ubuntu destop is so broken...

I've had it with Linux on the desktop.

Things that didn't work...

my office's Java based SSL VPN
flickr uploader
all sound and, therefore, dependent applications like Skype
and more...

To be fair, the issue is mostly with my computer's particular hardware. For example, my computer's NVidia card driver constantly pegged the processor at 25%. Not acceptable.

I am sad that Ubuntu on the desktop didn't work out for me. Oh well, back to XP for at least another year.

13 December 2008

So far, so good

The switch from XP to Ubuntu has gone just fine so far. There are a couple of things that I'd like to work out. One is that I can't seem to connect to my office's SSL based VPN which uses Java. I have the 1.6 JRE from Sun installed and configured but, when I connect to the office VPN, it just freezes up Firefox.

I also need to try the proprietary NVidia Geoforce graphics card driver. The screen transitions and scrolling with Firefox seem pretty jerky without it.

12 December 2008

Ubuntu - here we go again

So, I got a new laptop at work. It's a Dell Vostro 2510 with a 250gb 7200rpm SATA HD and 4 Gb RAM. It came downgraded to XP. I had installed Ubuntu to dual-boot with XP. That worked okay but for some reasons that I can't remember I wasn't quite satisfied and decided to remove Ubuntu. That happened on Monday.

After having settled for XP I started to set up my many applications. After a day or so I noticed that many fundamental programs were not working for me. That led to me conclude that I probably needed to wipe XP and reinstall from the source media that Dell provided.

Well, as long as I'm going to reinstall, why not put Ubuntu back on and force myself to use it for a week? Well, that's where I am now. I'm going to go Ubuntu whole hog for at least a week. The only thing that I know doesn't work is my soundcard. The graphics card will require a proprietary driver to do compiz. Even then, I think that the compiz didn't work quite right so I'll probably not get to use that. Small disappointment.

I hate that I have burned so much time on this laptop when I have more interesting things to be doing at work.

19 November 2008

PHP+IIS="No input file specified"

I was having frustrating problems with PHP on an IIS site complaining that "No input file specified." The frustration came from the fact that I am facile with Apache but know next to nothing about IIS. Ultimately, I came across this link which completely solved my problem.

05 November 2008

Narrative tests? Maybe but probably not.

There is a lot of buzz (probably always has been) for moving to narrative test frameworks and away from classic Unit Test frameworks such as Python's unittest and Ruby's Test::Unit. Having never tried a narrative test framework I'm partial to the classic frameworks for both Python and Ruby if for no other reason then they function identically so the framework is the same for both languages although the grammar is different.

Here's a few blog posts against narrative test frameworks. The first two give good reasons not to prefer them.

09 October 2008

Apache https proxying Mongrel with file upload

We had a Ruby on Rails form that did a POST of some form data including a file upload. The setup was Apache 2.2 proxying a bunch of mongrels on a Windows server. Port 80 was set to redirect to 443 via a rewrite rule. The proxy modules that we were using included proxy_http_module, proxy_balancer_module, and proxy_module. With that setup the file upload was failing. It appeared to timeout and not even hit the controller method that handled the POST.

We made one change to make the file upload work over https to proxied mongrels and that was to enable the proxy_connect_module.

17 September 2008

Python popen deadlocks

I have a python script that calls out to an external program using os.popen3(). That function returns file handle like objects for stdin, stdout, and stderr. I need to examine both stdout and stderr separately.

After running this script for a while I started to see it hang at times. This was especially true after I added in a parameter which increased the amount stderr. It turns out that there is a well known deadlock issue with the popen family of commands. Or, rather, it was not known to me until yesterday.

I started to tinker in my script using select.select() as a solution. That seemed a little foreign to my way of thinking so I changed directions and used separate threads for reading from the stderr and stdout file handles that os.popen2() returns. That seemed to do the trick so I'm pretty happy about that.

11 September 2008

Backups Reorganization pt. 10: Verification

I set up automated verification on the 4 backups servers on every backup job that they run. That simply involved the correct usage of the --verify option to rdiff-backup. This option calculates SHA1 checksums on files in the backups and compares them to the backups metadata. I don't put much stock in this process but feel that it is necessary to actually perform. Besides, the overhead is pretty low since the processing stays entirely on the backups server.

Rdiff-backup also sports a --compare-hash option in addition to --verify. The --compare-hash option actually calculates SHA1 checksums on the source server to compare to what is in the backups metadata. That seems nice but is probably going to be CPU intensive on the source server which I don't want. Still, I might just set it up as a weekly process to run in off hours. We'll see.

One thing that I learned is that Python < 2.4 doesn't support the decorator syntax in 2.4 and up. Rather than monkey with the small syntax differences and determining at runtime which Python version was executing the script, I just decided not to use decorators in one place where it would have been nice (but not necessary) to do so.

The next piece of this project will be to automate the test restores. That'll be slightly more tricky but ought to be satisfying to actually program up.

03 September 2008

Backups Reorganization pt. 9: Retrospect

I finally got around to addressing the backup of the one server that resides behind the firewall. I can't use the approach I had been using, that is, having the backups server initiate the backup because the backups server is itself outside of the firewall. The simplest thing to do is to backup the target server from within the firewall. As it turns out, my company has a couple of dozen terabytes of backup space inside the firewall all controlled by the Retrospect backup software. Setting up the job on the Retrospect server was easy and the initial backup of the target server is running now.

27 August 2008

Shelving AMQP for now

I recently put a lot of effort into working with the relatively new messaging queue protocol, AMQP. This involved reading the v. 0.8 spec., source code of some implementations, the RabbitMQ docs and boards, and the source code of the py-amqplib module.

Ultimately, I could not get the AMQP client and broker to behave in ways that I expected. This is a real disappointment because I had high hopes for it in future projects that will need to be scalable. I'll give it another 6 months to a year and then see where things are.

26 August 2008

Python logging.fileConfig() weirdness

I wrote a draft of a python application that I intended to turn into a Windows service. In it, I used the logging standard module and the fileConfig() function to configure up multiple log handlers and other cool logging related things. All of that worked great when run as a simple script on a Windows box.

When I wrapped up the functionality in a Windows service, however, the logging killed the service almost immediately with a "Bad File Descriptor" IOError exception being raised. I tried quite a lot of things to get it to work again but to no avail.

In the end, I replaced logging.fileConfig() with logging.basicConfig() and lo and behold it worked again. I don't know why it worked. Searching with Google yielded a few things about "atexit" on Windows or some such peculiarity. I don't care because my app. works again.

I'm pretty steamed with the logging module for wasting my afternoon. (I'm actually grateful to have such a wonderful module in the standard library at all.)

12 August 2008

Backups Reorganization pt. 8: Oracle

I was saving the Oracle database backup for last because I felt that it would be the most difficult to do right. As it turns out, the client's chief Oracle user and DBA has already done most of the work. He is dumping the databases that he cares about to a server and directory which I am already backing up. I need to ask him to try do do a test restore though. That's pretty key.

In the meantime, I learned about another server that needs to be backed up but which isn't. The only trick is that that particular server is behind a corporate firewall so my backups servers can't initiate contact. I'm going to have to have the target server initiate a push of data.

Ultimately, that's the model that I want for all backups. That is, a client resides on the target machine which polls a central server for instructions and then, if allowed, makes the connection to the backups server and sends the data.

11 August 2008

Backups Reorganization pt. 7: Zope

The client has three versions of Zope running on three servers. I took the work that I did to discover and dump the Subversion repositories and retrofitted that for Zope databases. It wasn't too difficult and I simply extended an existing class in the package that I am building. Writing the test cases took the most time, as is often the case.

The only thing that was a little different was that I needed to make use of a configuration file in which to put some non-shared setup between the three possible Zope instances. This is easy to do with the ConfigParser class in the Python standard library.

05 August 2008

CCP4 on RHEL5

CCP4 is an open-source, scientific application used in crystallography. It is written in C++ and Fortran and runs best on Linux.

A client needed help today in compiling this application on Red Hat Enterprise Linux (RHEL) 5. This blog posting is intended to provide a brief outline of getting CCP4 running on that platform.

Here's what not to try:

download source
use yum to get all of your prerequisites such as an old gcc compiler for c++ and fortran
source the includes/ccp4.setup file after editing some of its variables
set the CC and CXX environment variables to /usr/bin/gcc34
./configure linux
make

That is guaranteed to fail. It blows up at the point that it is compiling the MMDB for the Clipper subcomponent. Interestingly, telling the configure script to disable clipper has no effect.

One requirement of CCP4 is that it needs Tcl, Tk, and BLT. Tcl and Tk are not well supported in RHEL5 and BLT is non-existent. By not "well supported" I mean that the tk-devel and tcl-devel rpm packages do not exist from the main Red Hat repositories.

To get around this show-stopper, I turned to my CentOS 5.2 box. CentOS is a clone of RHEL. It has a few packages that RHEL does not, especially, the Tcl and Tk libraries and header files.

Here's the recipe of what actually worked:

Build Tk/Tcl/BLT

on a CentOS 5 box, use yum to install tk-devel and tcl-devel
download the Tk/Tcl/BLT tarball from the CCP4 site linked above
unpack the tarball and cd into it
run ./configure (perhaps with --prefix=_____)
note carefully the directories that it says that it will write to
make; make install
create a tar file with the files and locations listed in the configure output
copy the resultant tar file to the RHEL box
extract the files from the tar file into the same directory locations and overwrite, if necessary

Install CCP4

The trick is not to compile from source but, instead, to download the linux binaries. They claim to be known to compile under the ancient RH 8/9. Grab it anyway.
Unpack the binaries, edit and source the includes/ccp4.setup file, and run the BINARY.setup file.

That's it! CCP4 ought to be running on your RHEL 5 machine by this point.

01 August 2008

Backups Reorganization pt. 6: sysadmin slip-up

Earlier in this project, I described how I condensed the many Logical Volumes into one big one for each server performing remote backups. As part of that process I had to not only carefully rework the LVs but I also had to alter the /etc/fstab file for each partition that was removed. I knew that if I left a partition listed in the /etc/fstab file that didn't actually exist then the machine probably would not come back up after a reboot.

Well, that's exactly what happened with one of the four machines. I decided to reboot each machine since each had over 510 days of uptime. One did not come back on line. I had to travel on-site to the secured location to gain access. (Gaining physical access was tricky, as it should be. You don't want to let just anyone walk into your data center!)

Once I got a monitor and keyboard on the target server I saw that it had indeed tried to run fsck against a partition which was not mounted because it no longer existed. It was waiting to for user input to go into maintenance mode. The fix was easy -- simply remove the offending line from /etc/fstab.

30 July 2008

Backups Reorganization pt. 5: SCM

There are two Source Control Management (SCM) systems in use by the client: Subversion and Mercurial.

Subversion
Subversion is a client/server based SCM with a central repository to which clients synchronize. There is a built-in tool available called "svnadmin hotcopy" which, as the "hotcopy" parameter suggests, guarantees that I'll get a consistent, restorable backup of a repository.

Great. Now the problem is, where are the repositories? There are dozens of repositories that need to be dumped and I don't know what they are or where they are located in the root file system. The client creates repositories on the fly wherever they want. That's just the way they do their work. I had to come up with some scripts to help me determine the repository locations by traversing a given part of the file system. Of course, I used Python to do that work.

The gist of the script(s) is that it will enumerate every directory it traverses. It then tries to look for "signature" directories and files. By "signature" I mean that these are specifically named and if you see them then you have probably found a Subversion repository which can be backed up. Python made quick work of this task with the os.walk() method, the Set type, and the accompanying set-based math.

I have put all of my code into a python package so that it'd be easy to build and deploy for the various client sites. I also tried to make this code fairly generic so that it could be re-used with other clients. I used Test-Driven-Development while writing these scripts. Of course, the code is under version control on my company's own Subversion repository.

Is it really worth traversing the file system to search for Subversion repositories? Yes! I found dozens of repositories scattered all around. Each has a file system backup performed on its host partition but none were being dumped as they should to guarantee a consistent, restorable backup.

Mercurial
Mercurial is a distributed revision control system. There is no centralized server by design. I didn't find a lot of good information about getting a good backup. Some items pointed to "hg clone". From some reading, it seems that backups consist of the copies kept on developers' machines. Some folks keep a central computer on which they have a Mercurial client and keep a master copy. I decided that I was just going to let the backups stand as is and rely on the file system level backup. I'm not entirely happy about it and may want to revisit this decision as soon as I take care of the other applications that need attention.

Next up...Oracle.

21 July 2008

Backups Reorganization pt. 4: PostgreSQL

The first thing that I wanted to do in order to backup the data from various applications was to create an identically named space on the file system in which to dump data. I chose /var/data_backup/. Each client machine is using LVM extensively so I had to create the logical volume and auto-mount it to /var/data_backup/.

For the most part, creating the new LV was not a problem because two of the three servers had ample space. The third server, however, needed to have at least one existing LV reduced in order to have enough free space in the volume group.

Here are the steps I used to reduce the size of an existing LVM partition:

locally backup the partition to be reduced using, "rsync --archive"
stop the application that was making use of that partition, in this case, apache
umount the partition
run e2fsck against the partition
run resize2fs to reduce the size of the data on the partition
reduce the size of the partition using lvreduce
mount the partition
start the application that uses the partition

After that I was able to create the standard /var/data_backup/ partition as normal using LVM since there was now enough free space in the volume group.

Now it was time to actually backup the PostgreSQL database. After some research, it seemed that the preferred way to do this was to use pg_dumpall since it dumps both the application databases as well as the system database containing database roles and permissions. At this point I simply had to create the appropriate user and accesses in PostgreSQL and then setup the cron job on each target machine running PostgreSQL. Along the way, I had to set up automatic login on two of the boxes using a .pgpass file.

I tested the restore of the data on a throw away server back at the office. The idea is to read the dumped data into template1 for v. 7.3.* and postgres for 8.1.* on PostgreSQL. The restore appeared to be fine.

Lastly, of course, I needed to perform the one-time setup of the backup servers to grab the /var/data_backup/ directory on each machine. I also grabbed the /etc/ directory too since there is a lot of application configuration information there including pg_hba.conf file.

Next up will be...Subversion.

15 July 2008

Backups Reorganization pt. 3: Sidestepping LVM Snapshots

The previous backup strategy had been to create LVM snapshots on the target machines of the various filesystems, mount the individual snapshots on the filesystem, and then back up the files on the mount point. There was quite a bit of scripting complexity to create and mount the snapshots safely.

As far as I can tell, the advantage of using a snapshot is that rdiff-backup won't complain that files may be actively changing while the backup is in the process of being created. (If someone thinks of other reasons to use LVM snapshots then please comment.)

What using the snapshot does not address is the fact that many applications cannot be backed up directly. Backing up certain application files directly may be reported as successful but would, in fact, yield a corrupted backup. In other words, the backup system backs up the files successfully but then they are not able to be used by the application when they are restored. The client has several applications that fall into this category including: Oracle, PostgreSQL, Zope, Subversion, and others.

So, the next task is a system administrative one of dumping application data, backing up the dumps, and excluding the underlying source files. First up...PostgreSQL.

13 July 2008

Backups Reorganization pt. 2 - Partitioning Overload

The first step in getting a handle on the data backup and recovery strategy is to untangle some of what has been done already. The key problem is that each backup server's volume group was partitioned into dozens of logical volumes. The result is that certain logical volumes are filling up (or are full) while others are under utilized.

The quickest solution would be to reduce a largely empty partition and resize the full one(s). That works but forces to me to guess at if enough space has been allocated or too little. It puts me in the position of having to actively manage space in the logical volumes.

A better approach would be to condense all of the logical volumes into one BIG one. The hope is that there will be enough overall free-space for the medium-term. That's the approach that I took. It meant recreating all of the backups and that took a few days.

In the process of condensing the logical volumes, I learned quite a bit about LVM at this Debian link and liked what I learned. Along the way, I had to readdress the backup scripts that were used. That'll be the topic of a later post.

12 July 2008

Backups Reorganization pt. 1: Introduction

Over the next couple of weeks, I plan to blog with a series of articles describing a project at work to reorganize the backups for a client. The client runs a number of Debian and Fedora Core servers which are backed up remotely. The previous sysadmin. had done a really good job of planning it out and provided some decent documentation. The problem is that the solution doesn't scale and is just complicated enough that another sysadmin. (such as me) has to spend a lot of time getting up to speed.

My goals in reorganizing their backups are as follows:

Ensure that the data being backed up is actually restorable. Perform tests to confirm. For example, one cannot simply backup database files and expect the data to be restorable. Usually, with databases, one has to dump the data into a file and then backup the file. This will be true of other applications as well such as souce control management systems.
Simplify the backup process where possible. For example, dozens of scripts on multiple machines need to be invoked on a daily basis to perform backups. That could all be condensed into one machine and one or two scripts.
Document both the backup and restore procedure for the client. Plan for disaster recovery of the data.
Plan for future requirements, including backing up Windows machines.
Give the client a real confidence that their data is being adequately safeguarded. Again, this comes from periodically testing restores.
Centralize the backup job control onto one machine (currently, they are duplicated across four).
Provide reliable alerting of both failures and successes of backup jobs via configurable methods including syslog, email, and even rss.

26 June 2008

A foray into decorators and closures

Recently, I had to do some debugging on a project's python module that handled SOAP web services. I had written that module a while back and am still kind of proud of it.

The problem was that I needed different error handling via exceptions for different methods. Some of the methods needed one type of error handling and some needed another. What I wanted to do was to centralize the error handling. The alternative (at least the one that came to mind) was to duplicate the error handling in each method. Yuck.

Python has a nice feature called decorators which, to my understanding, allows one to wrap a method with some other code. I wrapped each applicable method that needed this error handling with an error handling decorator.

Enough narrative, let's look at the code!

from socket import error as SocketError
import datetime
import re
from pylib.util import config
from ZSI.client import Binding
from ZSI import ZSIException


class WebServiceError(Exception): pass
class InvalidURLError(WebServiceError): pass
class DatabaseNotReadyError(WebServiceError): pass


def error_handle(fn):
  '''A decorator to wrap web service calls with consistent error handling.'''
  def _execute(*args, **kwargs):
    try:
      result = fn(*args, **kwargs)
    except (SocketError, TypeError), e:
      raise WebServiceError(e)
    except ZSIException, e:
      # Handle SOAP Faults.
      # Check specifically if the SOAP message starts with an ODBC error string.
      # If it does then the FileMaker server is unavailable.
      if re.search(_WebService.odbc_error_pattern, str(e)):
        raise DatabaseNotReadyError(e)
      else:
        raise WebServiceError(e)
    except Exception, e:
      raise WebServiceError("Unhandled exception: %s" % e)
    else:
      return result
  return _execute

Later in the program I invoke the decorator be preceding the method definition. The method that follows is effectively wrapped. For example,

@error_handle
def getSettings(self):
  result = self.binding.getSettings()
  return result

29 May 2008

Apache + SSL + Name-Based-Virtual-Hosts

You might have stumbled on this page looking for a success story solving the riddle of the post's subject. Sorry. That's not the case.

Rather, I'm going to write about the problem, my efforts at workarounds, and finally, the recommended course(s) of action.

The Problem

My Apache web server fronts several applications. The server has several aliased IP names to the IP address. I would like to keep each application in its own virtual host, use name-based virtual hosting, and run some or all of them over SSL.

The Findings

Without having done any background reading, solving that problem ought to be trivial for Apache. It is not and here's why.

When a request is made for an https URL or tcp port 443, there is actually an SSL layer on top of HTTP. The SSL tunnel must be established first and then the http can flow through the tunnel. In establishing the tunnel, SSL does not have access to the HTTP "host" header and so cannot know which name based virtual host to use.

Here's a much better explanation from the folks at Apache:
http://httpd.apache.org/docs/2.2/ssl/ssl_faq.html#vhosts

The bottom line is that they claim that it is impossible to solve the SSL + Name-Based-Virtual-Hosts.

Natually, I didn't believe it (and neither did my manager). So I set out trying all sorts of hacks. All failed...some miserably so.

Ok, so now what?

The last line of the SSL FAQ link above provides a direction: "Using separate IP addresses for different SSL hosts. Using different port numbers for different SSL hosts."

The Conclusion(s)

Option A - Use separate IP addresses for different SSL hosts.

The idea here is that you can continue to use Virtual Hosts but they will not be name-based. Instead you must have multiple NICs in your server. Adding NICs won't scale well but there is a better alternative, espcially if you use Linux. You can simply use one NIC and then use IP aliasing to create clones of the NIC. Each clone gets its own unique IP address. This works great and is a breeze to implement.

The downside here is that externally available IP addresses are scarce and, therefore, costly for most companies. I don't have a good solution for that other than to perhaps look into IPv6 but that is the topic in itself.

Option B - Using different port numbers for different SSL hosts.

This solution is exactly what it seems but worse.
You would run your SSL over arbitrary, unused port numbers such as 444, 445, 446, etc. (443 is the standard HTTP SSL port).

This option disallows using aliased IP names. Instead you have a single ssl certificate and ssl key for the server. All https requests go the only IP name but then you use virtual hosts identified by the requested port number.

For example,
http://appserver.example.org:444/
http://appserver.example.org:445/
http://appserver.example.org:446/

This just loooks clumsy to me. Customers would balk.

Option C - Use a single IP address but unique Locations.

This is a variant of Option B above but maybe more sane.

One would have a single SSL enabled address such as, https://appserver.example.org/. You would then put each of your applications at different "locations."

For example,
https://appserver.example.org/feedback (the location is /feedback)
https://appserver.example.org/outfitter (the location is /outfitter)
https://appserver.example.org/delta (the location is /delta)

Then, in Apache you would define your Location blocks. So, these aren't Virtual Hosts but Location blocks.

The drawback here is that this solution isn't optimal for hosting a customer's application when the customer wants to use a different IP name.

In my opinion, this is clean for a company's internal applications but that's about it.

Appendix

Other directions that I investigated included packet mangling and/or and NAT using Linux's IP tables. IP Tables and Netfilter are awesome and are useful for more than stateful firewalling. Alas, it just didn't seem to solve the problem or else was just tending to be way too complicated when Option A above worked cleanly.

I also briefly implemented nginx and glanced at Squid for their ability to act as proxying servers and to front-end the Apache servers. Nginx, expecially, looked fantastic but didn't seem to have the ability to overcome the SSL + Name-Based-Virtual-Host issue. That issue seems to be a technical one and not an application-specific issue.

I looked into a lot of dark corners and tried out a lot of things in researching this topic, most of which I didn't even mention. Feel free to comment with follow-up or questions.

21 May 2008

MySQL Installer

I spent much of the last week fiddling with Inno Setup to create a MySQL installer for win32. The installer asks for the program install directory (and suggests a default) and the location of the data directory (and suggests a default). It independently installs MySQL server, ODBC, and GUI tools based on check boxes. That's it. It's pretty slick.

Why not just use the .msi installers that MySQL provides, you ask? Well, my installer is going out to dozens of sites and the configuration is set beforehand and is bound to our commercial application. The MySQL .msi installers ask dozens of configuration questions and we're looking to reduce that and to prevent the sysadmins who are installing our application from making wrong choices. For example, we must run the InnoDB engine only and never the non-transactional MyISAM engine. That's a choice the the MySQL .msi installer gives but we do not. My installer also automatically configures users, passwords, databases, tables, and access.

Sadly, this work invalidates my company's decision to standardize on PostgreSQL. Management decided to opt for short-term expediency instead of long-term robustness and scalability. Brilliant. (sarcasm) That won't come back to haunt them. (dripping sarcasm)

13 May 2008

Inno Setup

I have been tasked at work with writing a one-click MySQL installer for win32. Luckily, I have the awesome and amazing Inno Setup packaging tool at my disposal. Inno Setup is full-featured and capable, however, it isn't easy to use for complicated packaging scenarios. One has to discover and use helper scripts to simplify tasks such as decompression and manipulating environment variables. My secret weapon is that I have access to a great set of example scripts that a previous colleague wrote a few years ago. He was a super bright guy but it still took him months to produce our main Inno Setup script. That just shows the power and complexity of the tool. I expect that it'll take me 3-5 more days to get the MySQL installer just right with the proper configuration and addition of odbc and the gui tools.

08 May 2008

Ssh port forwarding, gateway ports, and timeouts

We have a client that firewalls its network to us except for tcp 22 (ssh) from one particular address -- our development server. The goal for us is to be able to remotely use the deployed web application that we developed for them so that we can check out the effect of bug fixes on the live site.

Enter ssh...

I'm not going to talk much about ssh and its better-known capabilities because that's going to be all over the web anyway. Really, go see for yourself.

Instead, we log into our local development server and issue this command:

ssh -g -f -L 8082:localhost:80 user@w.x.y.z sleep 60

where w.x.y.z represents an IP address of the client's server.

What that command does is to forward traffic from localhost:8082 to w.x.y.z:80. It also says to go into the background and to bind port 8082 to all interfaces and not just loopback. That means that anyone who can get to our server's port 8082 can use the tunnel but our development server is behind two firewalls so I'm not worried.

You can then log out of the development server since the ssh tunnel is backgrounded. Open a web browser on your local box, point it to the development server's port 8082, and you'll be seeing the client's web application. Nifty!

Finally, there's a "sleep" command at the end. That's just an example of ssh's ability to perform remote commands. The intent here is to keep the tunnel up for at least x number of sleep seconds but if another application is actively using that tunnel at the same time the tunnel will not close even though the sleep may have already expired. Once the sleep duration has passed and nothing is using the tunnel, the tunnel automatically closes.

That almost works in our case with testing the web application except that http is stateless. That is, it establishes a socket connection every time a request is made and then the socket is closed after the requester receives its response. (Actually, it takes a few seconds for the opened socket to disappear.) For all practical purposes, you probably have a 5-10 second inactivity timeout no matter what you set "sleep" to be as long as you're only using a stateless connection. Hopefully, that'll be enough time for you to test the bug fix on the client's box! One could always just set the sleep duration much higher. Remember, the auto-logout only takes effect once the sleep duration interval has passed.

I got the basics for this auto-timeout solution from http://www.g-loaded.eu/2006/11/24/auto-closing-ssh-tunnels/.

07 May 2008

Better XML needed in Python

I must say that I have just about lost patience with the heroic maintainers and developers of python xml libraries. I'm grateful to have those tools at my disposal but am frustrated that so many key ones seem to languish or are somewhat incomplete. For example, I love elementtree but there still is incomplete XPath support. I really need XPath and have to use 4Suite to get it. I also use 4Suite for applying style sheets. 4Suite is good but is just awkward enough to be not joyful to use. It went for a while without any updates and sat at beta status seemingly forever. Then there's ZSI for SOAP. Well, SOAP is pretty unhappy stuff anyway so you just kind of have to make the best of a bad situation.

Believe it or not, I have to use elementtree, 4Suite, and ZSI on a single project. Oh well, at least I was able to ditch PyXML on that project.

Thanks for the rant space.

24 April 2008

mod_rails is smokin' fast

I recently switched a heavily used production Ruby on Rails server from using a cluster of mongrels proxied by Apache's load balancer to using only mod_rails in Apache. No more mongrels.

Right away, I could tell that the application was noticeably more responsive. I plan to let this run for a week and then I may put up some pre and post benchmark data.

My first impressions:

mod_rails was easy to install using the documentation provided. I needed to install two pre-requisite packages on my CentOS 5.1 server: gcc-c++ and httpd_devel along with their own dependencies. The Yum package manager made quick work of that.
mod_rails has a friendly installer which interpreted into plain English what otherwise would have been hairy looking build errors.
mod_rails was easy to configure. I already had a virtual host set up for the Apache proxied mongrels. I only had to load the mod_rails module, remove the rewrite engine rules (as the docs instructed), restart Apache, and that was it.

My takeaways:

Deploying with mod_rails is much simpler than deploying with proxied mongrels.
It appears that mod_rails is much quicker than mongrel.
There also appears to be a substantial memory savings in not have to run so many mongrels.

07 April 2008

Case sensitivity in databases

Recently, I had to clean up some SQL to be case-insensitive. It was originally written to be run in MySQL which defaults to case-insensitive searches but can be set otherwise. Now, however, the SQL run is being run in PostgreSQL which always performs case-sensitive searches.

Here's an example using Ruby on Rails's Active Record:

Suppose that a name is stored in the providers table as "SMITH".

Below, the set collection will be empty because Smith is not like SMITH.
@search_field = "Smith"
set = Provider.find(:all, :conditions => "name LIKE '%#{@search_field}%'")

But this yields the expected results:
@search_field = "Smith"
set = Provider.find(:all, :conditions => ["LOWER(name) LIKE LOWER(?)", "%#{@search_field}%"])

The reason the second example works is because it uses the database's LOWER() string function to change Smith to smith and SMITH to smith.

And, as it so happens, the UPPER() and LOWER() string functions both exist and work identically in MySQL and PostgreSQL so the code above is portable.

01 April 2008

Rails and OS X

I spent hours today getting a Ruby on Rails application running on an OS X 10.5.2 server.

The problems were numerous but, honestly, it took so long mainly because I never use OS X and didn't have familiarity with some things that I thoroughly understand in Linux and Windows.

Before I could even begin the process, I had to export the tagged version from my company's CVS repository. CVS cannot export empty directories. From the CVS manual:

(Note that cvs export always removes empty directories.) Probably the best way to do this is to always specify `-P'; if you want an empty directory then put a dummy file (for example `.keepme') in it to prevent `-P' from removing it.

The person who initially applied the version tag didn't know to take care with the empty directories. As a result, I had to add an empty '.cvsignore' file into each directory, commit those dozens of .cvsignore files, and then tag them. That wasn't a big deal to do with "find" once I knew what was needed. This was a shotgun approach but was benign.

First, I could not get mysql to listen to the TCP/IP interface despite what was in the /etc/my.cnf file. Turns out that one has to fiddle with the setting with:
serveradmin settings mysql:allowNetwork = yes
(Is that going to persist after a reboot?)

Second, I couldn't get the C-based mysql ruby gem installed. I simply didn't have the header and library files against which it could build. This should not have been a big deal except that there wasn't a clear source package to download from the mysql site. I had to rely on a tech. support person to do that.

Finally, I had to make sure that the various mongrel instances would start automatically. That took some doing and I finally got them configured using launchd and launchctl. Launchd functions as a complete replacement for inetd, init.d, cron, and a few other things. Basically, one has to configure some XML files. It took some time to get my xml files just right.

One weird thing that I encountered is that if I told the mongrel service to bind to the "localhost" interface it would only do so using IP v.6. I had to specifically tell it to use "127.0.0.1" to coerce it to use IP v.4.

The Good:
Once I got into launchd, I began to appreciate how it could simplify a lot of the traditional unix utilities and configurations.

The Bad:
OS X and running Rails on OS X is a niche market for my company. I can't help but to think what a waste of time this was because I won't be able to leverage this experience over many installations. Even if I could, I wouldn't want to because I have zero interest in OS X.

The Ugly:
In reading the launchd documentation, the Apple folks come off as unjustifiably arrogant. Check out this quote from http://developer.apple.com/macosx/launchd.html:

If this is not sufficient for your job, you need to modify your program or file an enhancement request against launchd with Apple.

03 March 2008

Shibboleth and Rails

My company wrote a Rails application for a client at a major university that was intended to authenticate via Shibboleth to the campus's single-sign-on site. The application is fronted by a cluster of mongrel servers which are proxied by Apache 2.2.

The idea is to authenticate against Shibboleth which will set the REMOTE_USER variable in Apache to the authenticated user's username. The Rails application would then authorize against that REMOTE_USER variable.

The Shibboleth setup went okay and I was able to secure the site. The problem, however, was that the REMOTE_USER variable was not showing up in the request.env hash in Rails as it should.

Ultimately, the fix was to include this chunk of code in my Apache virtual host config file.

RewriteEngine On
RewriteCond %{LA-U:REMOTE_USER} (^(.+?)@.+$)
RewriteRule . - [E=RU:%2]
RequestHeader add X-Forwarded-User %{RU}e

What this did was to take the remote user variable that gets set by Shibboleth and write it to a new header. That header, in turn, gets renamed by mongrel as HTTP_X_FORWARDED_USER. In the Rails app, I could then read the value of request.env['HTTP_X_FORWARDED_USER'] in my authorization routine.

Oh, and that regex in the RewriteCond line was to strip of the "@foo.edu" that gets set even though Shibboleth was configured to strip that off.

Credit for this solution goes to the author of the post at http://www.ruby-forum.com/topic/83067#151189.

22 February 2008

ActiveRecord and SQL Server

I was having a really frustrating problem in that long-running queries to a (MS) SQL Server database were resulting in a TimeoutError being raised. After much research I determined that the issue was with the ADO driver's CommandTimeout property. It needed to be set to be much higher. From circumstantial evidence, it seemed to timeout after 30 seconds.

Setting that property via ActiveRecord was tricky. The property can be set directly against the underlying DBI database handle object. That object can be gotten at through the ActiveRecord::Base.connection object.

The name of my ActiveRecord model is IDX. So, I set the timeout with:
IDX.connection.instance_variable_get(:@connection).handle.instance_variable_get(:@handle).setproperty('CommandTimeout', 7200)
That works but, are you kidding me?!

Credit goes to the author of the blog at http://www.enterpriseetc.com/post/Ruby-in-Practice.aspx. His update at the very end ultimately pointed me in the right direction.

19 February 2008

Moving to PostgreSQL

We had a mini-throwdown on database servers yesterday. After some research, we determined that we could not use MySQL in one of our commercial products without purchasing the commercial license for MySQL. That license costs $600/year/installation. That kind of money adds up over 40 installations.

Given that, we have decided to try using PostgreSQL instead. It is a robust database with plenty of good software APIs written around it. We need to be able to connect with it via PHP, Phython, Ruby, and ODBC and as far as I can tell, all of those will work just fine. PostgreSQL, besides being robust, is absolutely free for us to use in our commercial application without a requirement to open source the code. We have even decided to try using it in our custom applications too.

I spent some time in postgres yesterday and got up to speed quickly. It has a wealth of enterprise-worthy features but I'm sure that we won't be using those. If I'm wrong, then, hey, at least they are already in the server and we can scale up as needed. One explicit technology decision that my company has made is to use the back-end database simply as a data store and not to use its advanced features such as views, triggers, and stored procedures, unless it becomes necessary on a case-by-case basis. We are pushing all of that business logic into the application layer so that it will all be in one place and will be easily maintainable by multiple developers. The goal is for our code to be database agnostic. The trade-off is a sligtly slower application.

Now my issue is selecting getting an ORM for Python to work with both MySQL and PostgreSQL. More to come...

28 January 2008

SOAP

I'm back on Ruby-ish stuff this week. The work I'm doing has a big SOAP component. The built-in Ruby SOAP library does quite well for my needs. It basically kicks the Python equivalent, ZSI, around the block.

Not that SOAP is the only web service in town but, for my projects, I have to use it because the upstream providers only offer SOAP.

Only a few more days until the season premiere of Lost! Yay!!!!

25 January 2008

Test driven development

Well, I have been off on a major project that uses Python. Man, I love that language.

What has been great with this project is that I finally have enough comfort and experience with testing that I am naturally doing test-driven development. I don't feel good about my code unless I have reasonable tests in place.