New York Linux Scene Journal: Open Source Software in the Financial Industry

Open Source Software in the Financial Industry

by Brendan W. McAdams

There are actually two 'cases' I want to focus on: one involves some basic bits of mod perl which can be used to create clean, dynamic database driven content; the other involves building a software based 'ssl accelerator' to add SSL to a non-SSL capable web server and save yourself and your firm fists full of dollars. All of the examples I will discuss can and have been created with free, open source software (with the exception of the database work, which in my case was done in Sybase. However as I don't really go into the SQL and don't touch upon the stored procedures, this shouldn't be an issue; MySQL or Postgres should suffice nicely.)

A Simple Mod_Perl System

The first section we will discuss is building dynamic content using mod_perl. Acquiring and building a mod_perl capable apache server, as well as a back-end database is left as an exercise to the reader; someone in NYLXS (or myself) should be able to help you with this task if you find you need help.We will assume a good working knowledge of Perl, and some knowledge of Apache for this discussion; mod_perl knowledge will be introduced as best I can. You should be able to approach this with little or no mod_perl knowledge. Further study can be found in the excellent "Writing Apache Modules with Perl and C" by Lincoln Stein &Doug MacEachern (copies of which will NOT be available after this lecture...).

The techniques I seek to introduce you to are the basics of those which I have used to create the most successful electronic (all web based) Municipal Bond Market currently available. We do a large number of trades and see an enormous amount of volume on a daily basis; the large number of traffic is scaled well within Apache, and mod_perl has helped alleviate much of our old 'cgi related stress'.

When I first joined my current firm in February of 2001, they were struggling to stabilize their system. Much of the front-end of the site was running on perl-based CGIs built by a contract programmer. While the contractor did a fairly good job programming (although most of his success is due to our Software Architect designing an amazing front-end design which relies on some of the best Sybase stored procedure work I've ever seen), our traffic (which was light at the time compared to what we see now, but still pretty decent) was stressing the limits of CGI. The core problem with the CGI model is that every time a request is made for an application, the server must spawn a shell process and run the script. This in itself is costly; add to that the fact that in our case we were opening a connection to Sybase and returning anywhere from 100 – 3000 rows of data (our most popular category returned an average of 1100 rows of data), you can imagine what it was doing to our web server.

Our contractor left the firm and I was asked to come up with a way to increase throughput temporarily until our next generation product could go live within the next 2 months. What I ended up developing was a ground-up rewrite of the original CGI code, maintaining the back-end Sybase architecture. Within a month of my starting we were live with the new code and saw throughput capabilities beyond our expectations; we tripled our trades within 2 months. It took another 9 months for our next generation system to go live (due to changing market conditions changing feature requirements, etc); in the meantime our perl system continued to service our customers in an outstanding manner, while many of our competitors (all using closed source, proprietary software) closed their doors for good or bought one another up (or bought each other and then closed). Despite our new system being the centerpoint of our technology (it's a Java based system that handles large transactions, et al in ways that my system couldn't [limited more by architectural design than software capabilities]) the perl system still serves a very active role in our organization, serving as a light, fast, robust and scalable 'prototyping' stage to demonstrate new technologies and features that our firm is pursuing. Rapid prototyping is just something our firm has been unable to do with Java. Enough ranting and self indulgence however; let's focus on building some basic mod_perl.

The first step in setting up your mod_perl application is creating a 'startup.pl' file. This file will allow you to control your generic perl environment that every mod_perl process will use. One of the *most important* things is to set the library path for your mod_perl code (which are actually perl modules). Those of you who know perl know that perl has several internal paths that it searches for code. These are usually something like /usr/local/lib/perl5/site_perl/5.6.0001/blahblah.

You may also know that you can add 'extra' paths to this (this is good for users without root privilege to install their own modules, or people who want to manage homegrown modules without installing them in the system path). You can do this by either adding the directory to the environment variable PERL5LIB, or by a 'use lib $directory;' statement in your actual perl file. We are going to use the latter method to set our perl directory.

In your Apache config directory (standardly ~apache/conf), create the file startup.pl, and start it out as a standard perl script. It will look something like this:

#!/usr/bin/perl

Now, start a BEGIN block and lay it out as follows:

BEGIN {

use Apache();

use lib Apache->server_root_relative('lib/perl');

}

1; # this is a perlism; all perl modules should return true.

Once we finish configuring it, startup.pl is preloaded into every mod_perl process; this is why we do things like preloading commonly used modules in it, and setting the library path. What we have done here is two things:

We've preloaded the Apache module. This is the base module for all mod_perl functions.

We've set the root library path for our project to ~apache/lib/perl.

apache->server_root_relative queries the root server object for the configuration and returns the base path of the apache installation (the server root) relative to the directories we passed it. If my server root is /usr/local/apache, apache->server_root_relative('lib/perl') will return the string "/usr/local/apache/lib/perl".

There are some other things we can do in startup.pl like preloading modules we use frequently (eliminating the need for the app to load them the first time it's run; Apache loads it on startup). We can do this by simply "use"ing the module in startup.pl (outside of BEGIN); this is left as an exercise for the reader.

Now that we've created startup.pl, we need to tell Apache about it. Open up your Apache config and add two directives at the top of it:

PerlRequire conf/startup.pl

PerlFreshRestart on

We have done two things here. The first is what we just talked about: we've told Apache about our startup file. Now when Apache starts up, it will 'Require' the startup file. This gives us one more added benefit through startup.pl aside from preloads; we can now check if code compiles. Basically, any modules you preload in startup.pl MUST COMPILE, or Apache will not start.

This is more or less a good thing; I try to preload all my modules if possible because I want to know the code doesn't work because it broke at Apache startup; not because a client got a 500 error page.

bThe second line we've added to the config file is the PerlFreshRestart. What this does is tells Apache that if it restarts (usually by an apachectl restart), it needs to dump all modules in memory and reload them from disk. This is a good thing - it makes sure your new code gets loaded if you changed anything. And again, one of the side effects of having preloaded modules in startup.pl will be if you have an issue with any of the new code, the server will simply refuse to restart until the code loads cleanly.

Now let's work on building a simple application. This will be comprised of one database table and two parts: one is a login module that interfaces your browser's "Login/Password" popup box and uses a database for authentication instead of those .htpasswd flat files we all hate so much; the other is a simple mod_perl app that uses a light template system to dump out your user info from the database. This is a simple application but it will show you the ins and outs of basic mod_perl coding. You should leave here with a basic understanding of how to build something, and hopefully convert your CGIs over to mod_perl.

Let's start with our Database table. I'm going to setup a hypothetical mysql table here, and be as generic as possible.

We want a table that contains the following information about a user:

First Name

Last Name

Username

Password

Email Address

User Type (Admin, User)

A simple sql script for this would be something like this:

CREATE TABLE users (

first_name varchar(65) NOT NULL,

last_name varchar(65) NOT NULL,

username varchar(65) NOT NULL UNIQUE,

password varchar(65) NOT NULL,

email varchar(100) NOT NULL,

user_type enum('Admin','User') NOT NULL,

PRIMARY KEY (username)

);

This works in MySQL; It should work in PostGres and other ANSI SQL compliant databases. Create this table in a database. For my examples I've used the database "nylxs".

Now we're going to create our authentication layer. This simple script is going to interface Apache's Authentication phase (you can read more about Apache's phases in the mod_perl book mentioned above; it's well beyond the scope of this discussion), read the username and password from the web browser, and try to authenticate.

Let's start our script. Change into your apache base directory. If you haven't already, create the lib/perl directory (mkdir lib; mkdir perl). Now change into lib/perl and make an apache directory (mkdir apache). Apache is always the base package name for a mod_perl module; this defines to anyone looking at your module that it requires mod_perl. Change into apache, and create an NYLXS directory (mkdir NYLXS). We are creating a package hierarchy. After the Apache, we always want either our company or project name to easily define our stuff. In this case, if

someone requests apache::NYLXS::Foo, Perl will end up looking in ~apache/lib/perl/apache::NYXLS for Foo.pm.

Lets change into the NYLXS directory and open a new file named DBAuth.pm.

Type the following:

package apache::NYLXS::DBAuth;

What we've done is declared explicitly our namespace for this file: the namespace is apache::NYLXS::DBAuth. It is important in mod_perl to clarify your namespaces. I'll explain why later.

Now add this:

use strict;

use apache::Constants qw(:common);

We've done two things here. One of which is use strict; which I hope most of you perl programmers recognise. We're going to digress a bit here so bear with me...

One of the features of Apache as a server is it's forked model. Basically, a parent process (sometimes with root privileges, so you can listen on privileged ports) listens for incoming requests; when it starts up it spawns off a set number of child processes (usually running as nobody or some other unprivileged user) who are slated to handle requests. When a request comes in, the parent process handles it by handing it off to a child process who does all the work. However, the child does not go away when it's done, and any global variables set within it, memory allocated (that was never garbage collected) and all sorts of other fun things bad programmers do will remain. It is important therefore to make sure that you run in use strict; so that you always declare your namespace and avoid global variables. Keeping proper namespaces should minimize on memory leaks.

One of the first mistakes I made when programming mod_perl was this very thing; I couldn't figure out why data from previous requests kept showing up in later ones. It turned out I wasn't properly scoping my variables and the data was persisting from process to process. This can sometimes be a good thing, but we won't go into that today. You may find in your own

explorations, certain times when it's useful.

The Apache::Constants package includes a bunch of variables and constants which may be used, for example the response "OK".

Here's the rest of the script:

use DBI;

sub handler {

my $r = shift;

# Declare our variables

my $sql;

my $sth;

my $dbh;

my $remote_user;

$dbh = DBI->connect("DBI:mysql:database=nylxs;host=localhost","root","") or die $DBI::errstr;

my ($res, $password) = $r->get_basic_auth_pw;

# what user are they authed as? (the Basic Auth HTTPAuthUser their Browser sent)

$remote_user = $r->connection->user;

unless ($remote_user &&$password) {

$r->note_basic_auth_failure;

$r->log_reason("Username or Password returned empty",$r->filename);

return AUTH_REQUIRED;

}

$sql = qq|

SELECT

username,

password,

user_type

FROM

users

WHERE

username = '$remote_user'

AND

password = '$password'

$sth = $dbh->prepare($sql);

$sth->execute;

my $cursor = $sth->fetchrow_hashref;

unless ($cursor->{'web_user'} eq $remote_user) {

$r->note_basic_auth_failure;

$r->log_reason("Authentication Failure Detected. Username: '$remote_user' Password:

'$password'",$r->filename);

return AUTH_REQUIRED;

}

$ENV{'USER_TYPE'} = $cursor->{'user_type'}

$ENV{'REMOTE_USER'} = $remote_user;

$sth->finish;

OK;

}

1; # this is a perlism; all perl modules should return true.

You may recognize the DBI stuff for the database; if you don't I recommend reading the docs for DBI on CPAN to get up to speed.

There are two new concepts here which you should note:

sub handler.

mod_perl always looks in a module for a subroutine called "handler". This is the default subroutine that mod_perl calls for execution. It passes in an Apache object $r. $r is the commonly used variable name for the Apache object which is always passed into handler subroutines. You can pass this around to other variables as well. This contains methods for accessing Apache internals, sending back information, getting information and many other things.

You'll note that we call several methods from Apache, such as $r->get"_basic_auth_pw, which returns an array containing two strings: the internal Apache reference to the status code it sent back to the client, and the password the client sent. $r->connection->user steps into Apache's subobject for the client connection and returns the username they sent; we use all of this information to determine if this user is authorised to user the site.

We select the data out of the database and verify it; if at any point anything goes wrong we return back the "Authorization Required" message to the browser. The first few times a browser gets this it will ask the user for the username/password combo again. After a few failures it may give up.

When we're done, we pass some data into environment variables. Since we are in the Authentication Phase of Apache, we can use the Environment stack to pass some stateful information down to later phases (Such as composing and returning the document, which we'll cover later). This is one of the best ways to keep state in mod_perl.

Your script is now ready to test. For a simple test, let's do three quick things: First, make sure you add a test user with full info to the database.

Example:

insert into users (first_name,last_name,username,password,email,user_type) values

('Brendan','McAdams','bmcadams','nylxs','rit@jacked-in.org','Admin');

Then, in the htdocs directory, create a quick testauth.html file that contains some text, like "Hi,

NYLXS".

Finally, we need to add a quick section to Apache:

Options All

AuthName "NYLXS mod_perl Auth Test"

AuthType Basic

PerlAuthenHandler Apache::NYLXS::DBAuth

require valid-user

</Files>

This tells Apache that it needs to require a valid authenticated user for the file testauth.html, and that it should let our Apache::NYLXS::DBAuth module handle the authentication.

Start Apache and try accessing your document; everything should work!

You can email me at rit@jacked-in.org if you have any problems/questions when you try this at home.