sobenews.jpg (4657 bytes)

The CargoWerks Case Study:

Simple Scripting with UNIX Makes for a Rich User Experience

By Jerry Sievers, Lead Architect www.SobeGirl.com

jerry@jerrysievers.com

 

Preface

I have not always been involved in the Porn Industry. To the chagrin of my darling wife I am working with www.SobeGirl.com because I see the enormous earning potential of the online adult industry and of course Erik is my dear friend. I don’t know if you all realize this but I was involved in a particularly serious motorcycle accident about one year ago. The results of this terrible night were my losing one leg and about 80 percent of my eyesight. My handicap makes working with computer technology challenging but not impossible. I want to thank you all for welcoming me into your virtual community. A place that transcends the physical and a place where I feel right at home. Emails are welcome.

Introduction

www.cargowerks.com is a freight auction system for the Airline Freight Industry. It is one of those typical Web start-ups that never got off the ground because of the stock market crash. It was a novel idea whose time never came. The following article describes what went into this site and explains how it was done. Many of the ideas contained here can be applied to adult websites creating a better user experience, more conversions and higher retention.

The Framework

The first thing I had to consider for this project was exactly what software would be needed. This would be more complex than a site consisting of only static pages. Our site needed to process and store data about the user accounts and whatever else the bidding system required. CargoWerks.com is therefore a dynamic website. Many of the pages are output by programs, which allows the content to be changed depending on conditions determined at runtime. We have some simple static pages too of course.

For the operating system of the server I chose Linux since it is the platform I am most experienced with and offers everything we need to host the site. Actually, since this and all software decisions were mine alone, I didn't spend even a moment to consider anything else. I do not use Windows, Macintosh or any other OS myself. Had this site needed to be run on a server with a non-Unix type OS, other programmers would be more qualified than me for this work.

Well, Linux is but one of many Unix variants and I would not hesitate to run the code on another Unix like Solaris (Sun Microsystem's Unix version). Though, the development machine would be my own home workstation, which is a Linux box. Porting to another Linux machine would certainly be the easiest. Next, a decision on what web server to use was made. This resulted in us using the Apache server. This is an excellent and open source coded server that runs on all Unixes. I have tested this software on Linux for the past few years and built other web sites with it. The newest stable production quality version was obtained and installed on the dev machine.

Installation of this software requires building from source code. This process is known as compilation. Mature software such as Apache can be usually built and installed with a few simple commands; 'tar' to unpack the archive, 'make' to run the compiler and 'make install' to move the object files into the runtime directories. The Apache software needed to be outfitted with one extension. This was a module called "modSSL". This is required so that the web server can use SSL (secure socket layer) for encrypted transactions.

Users of CargoWerks would be inputting sensitive information like charge account numbers during the registration process. Because of this, some of our web pages would be output in secure mode. Subsequent input would also be sent using SSL. So far we had the OS and web server. We also needed some programming language to write the CGIs with. CGI stands for Common Gateway Interface. This is a program that runs on the web server and outputs HTML code to be rendered by the client's browser. Static pages are simply written in HTML and output by the server upon each hit, never changing unless edited by the author. It's the CGIs that are used to make logic decisions on the server and then output whatever is appropriate.

Writing the Code

Perl5 was the language of choice for this task. It too is open source and is normally already installed on Linux machines. An extension called CGI.pm would be employed to make CGI coding easier. This module allows the programmer to do most everything with HTML forms using very concise commands. The fine details are done in the module which frees the programmer to concentrate on the more abstract problems. All that remained to be chosen was some type of RDBMS (relational database management system). For this I chose Postgres for many of the same reasons as the aforementioned software. There are at least three popular open source RDBMSs that we could have used. These are mSQL, mySQL and Postgres. Over the past few years, I have tried all of these and decided that Postgres was the most feature rich, well documented and mature of them.

Once again, the latest package was installed prior to the start of development. A supplied Perl module called PG.pm was used to interface our Perl coding to the database. In the past I have architected other websites that stored and retrieved used data without using any RDBMS at all. You can get away with this on very simple sites. CargoWerks however, went beyond this in what would be needed on the backend. Furthermore, I had avoided using RDBMSs before becoming masterful with them, Once I got used to using a "proper" database, there would be no turning back.

So the dev system was now complete. We had OS, webserver, SSL, Perl and Postgres. Finally, some rough coding could begin. In order to work on a programming project, the planner has to break the overall job into small tasks and workout and test these individually before integration. .Erik from www.SobeGirl.com, the designer got busy working on a layout and came up with the look and feel that you see when visiting the site. This consists of basically the top logo, left side navigation links and the main content portion. Main content takes up whatever screen real estate remains below the logo and right of the links. An orange background is placed behind the links to set them off from the content part. It is the content window that I would be concentrating on. With the exception of a few static pages like "help" and "about", most of what would be displayed is generated dynamically by CGIs.

Digging into the Architecture

Although the planning and architecture was very informal, a top-down approach was used in the development. This is where the abstract of the project is thought out and then finer details considered till arriving at the point that coding begins. This is to say that the whole big picture is kept in mind all the time. Sometimes a skeleton of the website will be in place before all modules are fully operational. Erik and I had done just this so the Owner of CargoWerks could check and approve that the navigation was what he intended. Some links would simply take you to a page that said "help page goes here" or similar. The first thing I provided for was the routine to output the user registration form. This was a CGI written to output all form fields to the user's browser when they visited the registration page.

Initially, the CGI did only the output part. Once the form had all fields needed the CGI was extended to receive the input from the user upon submission. Before our program would attempt to open the database, all user input had to be validated. Essentially, we determine that each field must meet certain criteria to be allowed into the database. For example, a username had to be between a predetermined minimum and maximum length of characters. Also, it's typical to allow only alphanumeric characters like A through Z and 0 through 9 here. Passwords also had restrictions on their length but we would allow any character in this field. Two password fields are given to help insure that the user does not make an error. Other fields were validated to check that they were not blank. An address, for example, must be given but there is really no easy way for the CGI to check that it's actually valid. In cases like this, we assume correctness if it's non-blank and continue.

Before even calling the validation subroutines, we attempt to clean up the user's input. All that this consists of is to remove any leading and trailing white space from the data. A user might enter an address and type a few spaces after the last visible character. This can cause confusion later if saved prior to cleanup. The code for this is named "clean-user-input". This is run right before another routine named "validate-input". Modular programming dictates that subroutines are used to handle discreet tasks and should be named to give an indication of their purpose. It is also possible to embed comments in the code to make it easier for another programmer to understand how the code works. Error messages were needed in the event that user input was unacceptable. This happens quite often and the graceful handling of these errors can make the difference in a well or poorly designed CGI. I have seen lots of sites which will output a special page to indicate errors and then tell the user "please hit the 'back button' to try again". I think this is bad and decided long ago that my CGIs would try to show error messages on top and re-output the form with all previous input intact. This requires the least amount of movement by the user to solve the problem. This method was used on CargoWerks too.

Tricks and Tips with CGI

There is some trick to this as well. In fact, recall that the CGI in its infancy did nothing but output a form. At this point in program development and testing, it was able to output the initial blank form, have input posted back to it, clean this input, validate the input and possibly display errors. Quite a lot but still not capable of storing anything in the database. What remained to be done, in cases when all input was OK, was connect to the database and save everything. To do this we use the PG.pm library's connect method to establish a channel into our database. Having done this, we insert the new data into the appropriate tables and add the new member to the password file. System side errors could occur at this point and we are prepared to log these into a Unix mechanism called 'syslog'. This event (which has never happened in production) would display a generic error page to the user asking them to alert the administrators.

Well-designed programs, to coin a phrase, expect the unexpected. So, a split second later and all data is saved. Now we have a new member and need to take them someplace. This is a milestone reached but only the tip of the iceberg as they say. The way the boss man wanted the site to behave after registration was, that the new member would be shown a "Thank You" page. They would then presumably go from there directly to the lane chooser page. This is where the user determines which lanes they are interested in for whatever operation’s they will be doing. A "lane" is a combination of two airports. The first being the airport of origin, the second being the destination airport. There was an arbitrary limit of three lanes that could be chosen at one time. These choices are saved in the database and presented to the user as defaults the next time they use the lane chooser. The storage and retrieval of lane preferences was added some time after initial development of this CGI. What this means is that at first, a user always had to make selections before proceeding. This would have been tedious and error prone. To make the site "remember" lane preferences meant adding structures to the database and additional code to do the data insertion and update.

The three choices were saved with the username as the key. For example, if user "smith" chose Miami to O'Hare, a record would be added to the database. Here's what such an insert looks like in ANSI SQL (structured query language,

' INSERT INTO lane_preferences )'smith', 'MIA', 'ORD'

There would be three such inserts, one for each combination of lanes. But there is more to it than this. I'll not go crazy with code snippets here. What we always do first, is try to load existing lane preferences. This will fail if this is the users very first time on the lane chooser. If it does fail, we proceed as if nothing wrong happened. The only result is that they will be shown the system defaults. After this first time, we will always be able to pull up their last choices from the database. So, let's walk through what happens the second and following times. The user accesses the lane chooser page. Their last settings are retrieved from the database table. This is used to preset the drop -down menus the user sees on their browser. User submits the page, perhaps changing the settings. Upon submission, we attempt to delete all records from the table, which correspond to this user. Then we insert the new settings in their place.

Soft Failures with Graceful Response

Interesting to note that we are using "soft failure" on these operations. This is sometimes useful and can save a few steps. What I mean is, that instead of checking that records exist and then updating them or inserting them if no records exist, we just always do a delete and insert. This lets us avoid having to "special case" the user's first time on lane chooser. Decisions like this may be influenced by performance considerations. This is to say that it might result in fewer CPU cycles or less disk access time by using the UPDATE SQL command when records exist. But in order to reduce code complexity, the DELETE/INSERT pair was implemented instead. Moving on to the next page after lane choice requires us to determine if the user is an airline or shipper. This is because from this point on, they do different things. And of course, this is easy since our database contains this info. All that's done is to consult the "account_type" field in the user data.

The way the data is organized is that shipper and airline account information is held in the same table. One of these fields is set to determine which type the user is. The fact that the account data is identical except for this one item made a compelling case for having both user types in one table. Other jobs where the divergence is greater might warrant storing the data in separate tables. All of these decisions are part of planning the database schema (a word for "structure" in RDBMS parlance). At times there may be no bids whatsoever in the system. In this case, neither airlines nor visitors will make it any further than the lane chooser page. This is because there will be no valid lanes with cargo in them from which to choose from. Normally however, the lane chooser would show that there is some cargo under one or more of the continents or regions shown at the header of the output table. It is from these lanes that the three choices are made for viewing by airlines.

Shippers are allowed to proceed past this point so that they can create bids from any arbitrary lanes. Also, depending on the type of user, the buttons presented are subject to change. Airlines are prompted to "view lanes", shippers are prompted to "create new bids". In this respect, each type of user has two distinct different responsibilities. Let's imagine that the site was just launched and there are no bids in the system. A shipper does the first part of the transaction asking to send his freight on lane so and so. Then he waits to get an email indicating that the bid was accepted. He might actively check the lanes by viewing them to see if there is any change. I should point out that the lane view page will automatically refresh every five minutes as long as the user's browser is on that page. A shipper can decide to remove one of their bids at any time prior to its acceptance.

Automating Chores

A shipper may also create additional bids in the system. There is a series of robot programs running on the server to do various housekeeping chores with the database. The most active of these is a program called "explore-bids:. It does precisely this. Every hour the program runs and for each run, it removes bids which have remained in the system longer than a certain number of hours. An administrator can determine this parameter. Or, alternately, the bid expiration program does not have to be run at all. This is what we did initially to keep the system full of bids. Normally, once business picked up, it would be essential to keep only the newest bids alive. It's reasonable to assume that bids remaining for a long time are "dogs" and probably will not result in any airlines accepting them.

 

Demo the System

At this point I'll mention that there are two dummy accounts that can be used for testing. I invite you to try both of them to see how the site works. First log in as the shipper. username "S", password "S". Go ahead and create a few bids and notice that you may receive error messages and hopefully an informative suggestion on how to proceed. Note that the username and password are both in upper case. Fairly complex validation takes place on the data input by the shipper during this phase. I shall not elaborate on it further here as it is generally along the lines of what we've covered in the discussion about registration. Next, you'll need to log in as an airline. Be sure to exit your browser completely. This should prevent it from trying to reconnect as shipper again. Now log in as username " a", password 'A'. View the same lanes that you had put freight in as shipper. Then go ahead and accept one or more of the bids. This completes the process from the users' perspective. We still have a little processing in the backend upon bid acceptance. All that's left is for the system to look up the email addresses of both shipper and airline plus other details of the transactions and You will not get this email unless you edit the account info while logged in to set the email address properly. The CargoWerks system then generates emails to both parties.

Jerry Sievers March 2001

jerry@jerrysievers.com

for www.sobegirl.com